Re: I18N issues with the XML Specification

People are not trying to throw away the BOM. The BOM is extremely useful
for plain, untagged text, where there is no indication of the character
encoding. However, there are many circumstances where the BOM is
inappropriate, and one has the mechanism for explicitly declaring the
character encoding. The UTC (and that RFC) use the terms "UTF-16BE" and
"UTF-16LE" for those circumstances.

There are some guidelines in http://www.unicode.org/unicode/faq/#BOM

Mark
___
Mark Davis, IBM Center for Java Technology, Cupertino
(408) 777-5850 [fax: 5891], mark.davis@us.ibm.com, president@unicode.org
http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014



John Cowan <jcowan@reutershealth.com>@w3.org on 2000.04.06 09:31:07

Sent by:  w3c-i18n-wg-request@w3.org


To:   MURATA Makoto <muraw3c@attglobal.net>
cc:   Rick Jelliffe <ricko@gate.sinica.edu.tw>, xml-editor@w3.org,
      w3c-i18n-ig@w3.org, w3c-xml-core-wg@w3.org
Subject:  Re: I18N issues with the XML Specification



MURATA Makoto wrote:

> RFC 2871 is already an RFC.  In my understanging, people are
> trying to throw away the BOM by introducing charset names "utf-16le"
> and "utf-16be".

Some people have already thrown away the BOM.  RFC 2871 introduces
names for the results of doing so.

> If the handling of UTF-16LE/UTF-16BE is mandatory, the XML processor
> is required to handle new octet sequences.  I do not think all exising
> processors can handle "<?xml encoding="UTF-16LE"?>" in UTF-16LE.

No processor can be required to handle UTF-16LE/BE, only UTF-16 (and
UTF-8).

--

Schlingt dreifach einen Kreis um dies! || John Cowan
<jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)

Received on Monday, 10 April 2000 19:24:29 UTC