- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Mon, 30 Oct 2006 09:49:31 -0500
- To: <public-xml-core-wg@w3.org>
I'd like for John to be able to send this reply to the commentor, but we should probably make a decision on the question John asks near the bottom of his message. Anyone (esp., Francois, Richard, Daniel, Glenn, Henry) have any thoughts on John's question? paul > -----Original Message----- > From: public-xml-core-wg-request@w3.org > [mailto:public-xml-core-wg-request@w3.org] On Behalf Of John Cowan > Sent: Wednesday, 2006 October 25 11:17 > To: public-xml-core-wg@w3.org > Subject: XML PE 157 > > > (Well, it's simpler than I thought.) > > Dieter Köhler <d.k@philo.de> writes > at > <http://lists.w3.org/Archives/Public/xml-editor/2006JulSep/0007.html>: > > > Appendix F.1 of the XML specs presents examples about how to > > automatically detect the encoding of an entity from the first > > characters of an XML encoding declaration without a byte order mark. > > These examples include UTF-16BE and UTF-16LE. However, section 4.3.3 > > says that entities encoded in UTF-16 MUST begin with a byte > order mark. > > That is strictly limited to the UTF-16 encoding, and excludes the > related UTF-16LE and UTF-16BE encodings, in which BOMs are > not present. > Note that "UTF16-LE" does not mean "UTF-16 encoding whose BOM shows it > to be little-endian" but rather "UTF-16-like encoding in little-endian > order without a BOM." If U+FEFF appears at the beginning of > a UTF-16LE > or UTF16-BE document, it is a ZWNBSP character (and therefore > the document > cannot be well-formed XML), not a BOM. > > > In the light of the examples it seems that the intention of > the specs is > > to demand a UTF-16 byte order mark only when no XML > declaration is used. > > Is this interpretation of the specs correct? > > No. If the encoding is UTF-16, a BOM is mandatory, whether or not an > XML declaration is present. > > > If the answer is "no", I would suggest to remove the two > incriminated > > examples from Appendix F.1 and to add an appropriate warning. > > The examples are not in error, because they refer to the UTF-16LE and > UTF-16BE encodings rather than the UTF-16 encoding. > > Core WG: Should we add specific references to UTF-16BE, > UTF-16LE, CESU-8, > etc. etc. to 4.3.3? If so, we might as well remove "We consider the > first case first" from Appendix F; it's more than obvious. > > -- > Where the wombat has walked, John Cowan <cowan@ccil.org> > it will inevitably walk again. http://www.ccil.org/~cowan >
Received on Monday, 30 October 2006 14:49:47 UTC