RE: XML PE 157

I'd like for John to be able to send this reply to the
commentor, but we should probably make a decision on
the question John asks near the bottom of his message.

Anyone (esp., Francois, Richard, Daniel, Glenn, Henry) 
have any thoughts on John's question?

paul

> -----Original Message-----
> From: public-xml-core-wg-request@w3.org 
> [mailto:public-xml-core-wg-request@w3.org] On Behalf Of John Cowan
> Sent: Wednesday, 2006 October 25 11:17
> To: public-xml-core-wg@w3.org
> Subject: XML PE 157
> 
> 
> (Well, it's simpler than I thought.)
> 
> Dieter Köhler <d.k@philo.de> writes
> at 
> <http://lists.w3.org/Archives/Public/xml-editor/2006JulSep/0007.html>:
> 
> > Appendix F.1 of the XML specs presents examples about how to
> > automatically detect the encoding of an entity from the first
> > characters of an XML encoding declaration without a byte order mark.
> > These examples include UTF-16BE and UTF-16LE. However, section 4.3.3
> > says that entities encoded in UTF-16 MUST begin with a byte 
> order mark.
> 
> That is strictly limited to the UTF-16 encoding, and excludes the
> related UTF-16LE and UTF-16BE encodings, in which BOMs are 
> not present.
> Note that "UTF16-LE" does not mean "UTF-16 encoding whose BOM shows it
> to be little-endian" but rather "UTF-16-like encoding in little-endian
> order without a BOM."  If U+FEFF appears at the beginning of 
> a UTF-16LE
> or UTF16-BE document, it is a ZWNBSP character (and therefore 
> the document
> cannot be well-formed XML), not a BOM.
> 
> > In the light of the examples it seems that the intention of 
> the specs is
> > to demand a UTF-16 byte order mark only when no XML 
> declaration is used.
> > Is this interpretation of the specs correct?
> 
> No.  If the encoding is UTF-16, a BOM is mandatory, whether or not an
> XML declaration is present.
> 
> > If the answer is "no", I would suggest to remove the two 
> incriminated
> > examples from Appendix F.1 and to add an appropriate warning.
> 
> The examples are not in error, because they refer to the UTF-16LE and
> UTF-16BE encodings rather than the UTF-16 encoding.
> 
> Core WG: Should we add specific references to UTF-16BE, 
> UTF-16LE, CESU-8,
> etc. etc. to 4.3.3?  If so, we might as well remove "We consider the
> first case first" from Appendix F; it's more than obvious.
> 
> -- 
> Where the wombat has walked,            John Cowan <cowan@ccil.org>
> it will inevitably walk again.          http://www.ccil.org/~cowan
> 

Received on Monday, 30 October 2006 14:49:47 UTC