- From: Francois Yergeau <FYergeau@alis.com>
- Date: Thu, 9 Oct 2003 16:22:17 -0400
- To: "'Joseph Kesselman'" <keshlam@us.ibm.com>, Martin Duerst <duerst@w3.org>
- Cc: Johnny Stenback <jst@w3c.jstenback.com>, "'w3c-i18n-ig@w3.org'" <w3c-i18n-ig@w3.org>, "'www-dom@w3.org'" <www-dom@w3.org>, www-dom-request@w3.org
Joseph Kesselman wrote: > The XML Rec doesn't suggest how to select which of these to use when > writing out... > > This doesn't strike me as being more of a problem for the DOM > than it is for anyone else... It's not really a problem, but a question of getting the spec straight over exactly how the selection of the output encoding works. Martin has a valid point since "UTF-16" as a charset tag is a special beast: it doesn't completely specify what the output should be. Specifically, it says to output in UTF-16 in either one of the two possible byte orders, and to output a BOM to indicate which. This is exactly the UTF-16 that XML parsers are required to grok, and therefore the one that DOM L&S should mandate (in addition to UTF-8). It appears pointless to *require* DOM implementations to support the "UTF-16BE" and "UTF-LE" values of the encoding parameter, since XML parsers are not required to grok these. It appears overkill (and potentially confusing) to either introduce an additional parameter or redefine the meanings of "UTF-16BE" and "UTF-LE" in order to be able to control the byte order in UTF-16. So let's keep it simple, just mandate "UTF-8" and "UTF-16", the latter in implementation-defined byte order and with a BOM to indicate it. -- François
Received on Thursday, 9 October 2003 16:23:02 UTC