RE: DOMString Character Encoding from Allen, Michael B (RSCH) on 2002-02-19 (www-dom@w3.org from January to March 2002)

From: Allen, Michael B (RSCH) <Michael_B_Allen@ml.com>
Date: Tue, 19 Feb 2002 18:40:54 -0500
To: "'Philippe Le Hegaret'" <plh@w3.org>, "WWW DOM" <www-dom@w3.org>
Message-ID: <2D31030A810FD611973700306E0208F61997C5@ehope07.hew.us.ml.com>

> -----Original Message-----
> From:	Philippe Le Hegaret [SMTP:plh@w3.org]
> Sent:	Tuesday, February 19, 2002 6:23 PM
> To:	WWW DOM
> Subject:	Re: DOMString Character Encoding
> 
> On Sun, 2002-02-17 at 19:15, Allen, Michael B (RSCH) wrote:
> > 	Specifying the type is one thing, but specifying the encoding is another.
> > 	Making it UTF-16 (big endian, little endian, w/wo BOM?) unnecessarily
> > 	constrains the implementation. I know first hand it creates a significant barrier
> > 	for C. It requires that the implementation provide all the usual string
> > 	manipulation functions. Consider what would happen if the DOMString type
> > 	were defined as a long and specified the encoding as UCS-4BE? What would
> > 	the Java language binding look like?
> 
> see
> [[
> Applications must encode DOMString using UTF-16
> ]]
> http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#ID-C74D1578
> 
> big endian or little endian is platform dependent. I don't think that
> the BOM doesn't have anything to do in a DOMString.
> 
	Internally I doubt Java Strings have BOMs but if you serialize one they sure do. But
	that doesn't matter because Java users should never be concerned with the actual
	character encoding of Java Strings. I'm trying to make the same point about
	DOMString but I'm not sure anyone has acknowledged they even know what I'm
	talking about. Or I'm missing something fundamental here.

	Mike

Received on Tuesday, 19 February 2002 18:40:58 UTC