W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > December 2006

Re: UTF-16 and Byte Order Mark

From: John Cowan <cowan@ccil.org>
Date: Wed, 20 Dec 2006 15:52:01 -0500
To: d.k@philo.de
Cc: public-xml-core-wg@w3.org
Message-ID: <20061220205201.GD29184@ccil.org>

Our apologies for the long delay in responding to your message.
The content of this message has been approved by the XML Core WG.

You wrote at

> Appendix F.1 of the XML specs presents examples about how to
> automatically detect the encoding of an entity from the first
> characters of an XML encoding declaration without a byte order mark.
> These examples include UTF-16BE and UTF-16LE. However, section 4.3.3
> says that entities encoded in UTF-16 MUST begin with a byte order mark.

That is strictly limited to the UTF-16 encoding, and excludes the
related UTF-16LE and UTF-16BE encodings, in which BOMs are not present.
Note that "UTF16-LE" does not mean "UTF-16 encoding whose BOM shows it
to be little-endian" but rather "UTF-16-like encoding in little-endian
order without a BOM."  If U+FEFF appears at the beginning of a UTF-16LE or
UTF16-BE document, it is not a BOM but a ZWNBSP character (and therefore
the document cannot be well-formed XML.  cannot be well-formed XML),
not a BOM.

> In the light of the examples it seems that the intention of the specs is
> to demand a UTF-16 byte order mark only when no XML declaration is used.
> Is this interpretation of the specs correct?

No.  If the encoding is UTF-16, a BOM is mandatory, whether or not an
XML declaration is present.

> If the answer is "no", I would suggest to remove the two incriminated
> examples from Appendix F.1 and to add an appropriate warning.

The examples are not in error, because they refer to the UTF-16LE and
UTF-16BE encodings rather than the UTF-16 encoding.

The Core WG will be adding language to 4.3.3 stating that UTF-16BE and
UTF-16LE are specifically not UTF-16.

I marvel at the creature: so secret and         John Cowan
so sly as he is, to come sporting in the pool   cowan@ccil.org
before our very window.  Does he think that     http://www.ccil.org/~cowan
Men sleep without watch all night?
Received on Wednesday, 20 December 2006 20:52:18 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:16:37 UTC