- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Wed, 20 Dec 2006 18:14:13 -0500
- To: <xml-editor@w3.org>
- Cc: <d.k@philo.de>
Forwarding to the public comment's list. paul -----Original Message----- From: public-xml-core-wg-request@w3.org [mailto:public-xml-core-wg-request@w3.org] On Behalf Of John Cowan Sent: Wednesday, 2006 December 20 14:52 To: d.k@philo.de Cc: public-xml-core-wg@w3.org Subject: Re: UTF-16 and Byte Order Mark Our apologies for the long delay in responding to your message. The content of this message has been approved by the XML Core WG. You wrote at <http://lists.w3.org/Archives/Public/xml-editor/2006JulSep/0007.html>: > Appendix F.1 of the XML specs presents examples about how to > automatically detect the encoding of an entity from the first > characters of an XML encoding declaration without a byte order mark. > These examples include UTF-16BE and UTF-16LE. However, section 4.3.3 > says that entities encoded in UTF-16 MUST begin with a byte order mark. That is strictly limited to the UTF-16 encoding, and excludes the related UTF-16LE and UTF-16BE encodings, in which BOMs are not present. Note that "UTF16-LE" does not mean "UTF-16 encoding whose BOM shows it to be little-endian" but rather "UTF-16-like encoding in little-endian order without a BOM." If U+FEFF appears at the beginning of a UTF-16LE or UTF16-BE document, it is not a BOM but a ZWNBSP character (and therefore the document cannot be well-formed XML. cannot be well-formed XML), not a BOM. > In the light of the examples it seems that the intention of the specs is > to demand a UTF-16 byte order mark only when no XML declaration is used. > Is this interpretation of the specs correct? No. If the encoding is UTF-16, a BOM is mandatory, whether or not an XML declaration is present. > If the answer is "no", I would suggest to remove the two incriminated > examples from Appendix F.1 and to add an appropriate warning. The examples are not in error, because they refer to the UTF-16LE and UTF-16BE encodings rather than the UTF-16 encoding. The Core WG will be adding language to 4.3.3 stating that UTF-16BE and UTF-16LE are specifically not UTF-16. -- I marvel at the creature: so secret and John Cowan so sly as he is, to come sporting in the pool cowan@ccil.org before our very window. Does he think that http://www.ccil.org/~cowan Men sleep without watch all night?
Received on Wednesday, 20 December 2006 23:14:24 UTC