RE: i18n reviews of DOM 3 Core and Load&Save from Francois Yergeau on 2003-10-09 (www-dom@w3.org from October to December 2003)

From: Francois Yergeau <FYergeau@alis.com>
Date: Thu, 9 Oct 2003 16:22:17 -0400
To: "'Joseph Kesselman'" <keshlam@us.ibm.com>, Martin Duerst <duerst@w3.org>
Cc: Johnny Stenback <jst@w3c.jstenback.com>, "'w3c-i18n-ig@w3.org'" <w3c-i18n-ig@w3.org>, "'www-dom@w3.org'" <www-dom@w3.org>, www-dom-request@w3.org
Message-ID: <F7D4BDA0E5A1D14B99D32C022AEB73660EB3C1@alis-2k.alis.domain>

Joseph Kesselman wrote:
> The XML Rec doesn't suggest how to select which of these to use when
> writing out...
> 
> This doesn't strike me as being more of a problem for the DOM 
> than it is for anyone else...

It's not really a problem, but a question of getting the spec straight over
exactly how the selection of the output encoding works.

Martin has a valid point since "UTF-16" as a charset tag is a special beast:
it doesn't completely specify what the output should be.  Specifically, it
says to output in UTF-16 in either one of the two possible byte orders, and
to output a BOM to indicate which.  This is exactly the UTF-16 that XML
parsers are required to grok, and therefore the one that DOM L&S should
mandate (in addition to UTF-8).  

It appears pointless to *require* DOM implementations to support the
"UTF-16BE" and "UTF-LE" values of the encoding parameter, since XML parsers
are not required to grok these.  It appears overkill (and potentially
confusing) to either introduce an additional parameter or redefine the
meanings of "UTF-16BE" and "UTF-LE" in order to be able to control the byte
order in UTF-16.

So let's keep it simple, just mandate "UTF-8" and "UTF-16", the latter in
implementation-defined byte order and with a BOM to indicate it.

-- 
François

Received on Thursday, 9 October 2003 16:23:02 UTC