W3C home > Mailing lists > Public > www-dom@w3.org > January to March 2002

RE: DOMString Character Encoding

From: Allen, Michael B (RSCH) <Michael_B_Allen@ml.com>
Date: Tue, 19 Feb 2002 18:40:54 -0500
Message-ID: <2D31030A810FD611973700306E0208F61997C5@ehope07.hew.us.ml.com>
To: "'Philippe Le Hegaret'" <plh@w3.org>, "WWW DOM" <www-dom@w3.org>


> -----Original Message-----
> From:	Philippe Le Hegaret [SMTP:plh@w3.org]
> Sent:	Tuesday, February 19, 2002 6:23 PM
> To:	WWW DOM
> Subject:	Re: DOMString Character Encoding
> 
> On Sun, 2002-02-17 at 19:15, Allen, Michael B (RSCH) wrote:
> > 	Specifying the type is one thing, but specifying the encoding is another.
> > 	Making it UTF-16 (big endian, little endian, w/wo BOM?) unnecessarily
> > 	constrains the implementation. I know first hand it creates a significant barrier
> > 	for C. It requires that the implementation provide all the usual string
> > 	manipulation functions. Consider what would happen if the DOMString type
> > 	were defined as a long and specified the encoding as UCS-4BE? What would
> > 	the Java language binding look like?
> 
> see
> [[
> Applications must encode DOMString using UTF-16
> ]]
> http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#ID-C74D1578
> 
> big endian or little endian is platform dependent. I don't think that
> the BOM doesn't have anything to do in a DOMString.
> 
	Internally I doubt Java Strings have BOMs but if you serialize one they sure do. But
	that doesn't matter because Java users should never be concerned with the actual
	character encoding of Java Strings. I'm trying to make the same point about
	DOMString but I'm not sure anyone has acknowledged they even know what I'm
	talking about. Or I'm missing something fundamental here.

	Mike
Received on Tuesday, 19 February 2002 18:40:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 June 2012 06:13:55 GMT