Re: i18n reviews of DOM 3 Core and Load&Save from Joseph Kesselman on 2003-10-09 (www-dom@w3.org from October to December 2003)

From: Joseph Kesselman <keshlam@us.ibm.com>
Date: Thu, 9 Oct 2003 15:33:26 -0400
To: Martin Duerst <duerst@w3.org>
Cc: Francois Yergeau <FYergeau@alis.com>, Johnny Stenback <jst@w3c.jstenback.com>, "'w3c-i18n-ig@w3.org'" <w3c-i18n-ig@w3.org>, "'www-dom@w3.org'" <www-dom@w3.org>, www-dom-request@w3.org
Message-ID: <OF693A0E3C.69D4DD1B-ON85256DBA.006ADA27-85256DBA.006B6E84@us.ibm.com>

> I wonder how the DOM is able
>to make the distinction between little-endian and big-endian versions
>of UTF-16.

The XML Recommendation gives specific suggestions on how to guess encodings
when reading from a byte stream -- use the Byte Order Mark if available,
otherwise use the <? at the start of the XML Declaration/Text Declaration
if one exists, otherwise make the best guess you can and if it's wrong
that's the user's fault for not giving you a better set of hints to work
with.

The XML Rec doesn't suggest how to select which of these to use when
writing out. If your serializer generates a BOM and/or a <?xml?>
declaration with the encoding correctly specified, you should be fine.

This doesn't strike me as being more of a problem for the DOM than it is
for anyone else...

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk

Received on Thursday, 9 October 2003 15:47:40 UTC