- From: John Cowan <cowan@ccil.org>
- Date: Wed, 4 Feb 2004 22:46:53 -0500
- To: Philippe Le Hegaret <plh@w3.org>
- Cc: www-dom@w3.org
Philippe Le Hegaret scripsit: > Looking again at XML 1.0 3rd, it says that UTF-16 encoded entities MUST > being with a BOM. Unless I'm misinterpreting the meaning of "UTF-16 > encoded entities", I would say that it does include UTF16-BE and > UTF16-LE as well. The sentence in the previous paragraph: # The terms "UTF-8" and "UTF-16" in this specification do not apply to # character encodings with any other labels, even if the encodings or # labels are very similar to UTF-8 or UTF-16. is intended to cover UTF-16BE and UTF-16LE, as well as CESU-8. Only the encodings actually named "UTF-16" and "UTF-8" are allowed without an encoding declaration, and the "UTF-16" encoding actually requires a BOM in XML, though not in plain text generally. After all, the whole point of the encodings "UTF-16BE" and "UTF-16LE" is that they MUST NOT begin with a BOM: if the first two bytes are 0xFE 0xFF or 0xFF 0xFE, that represents a U+FEFF character, and if document is not well formed. -- Trust me on this one. I was there. :-) -- One Word to write them all, John Cowan <jcowan@reutershealth.com> One Access to find them, http://www.reutershealth.com One Excel to count them all, http://www.ccil.org/~cowan And thus to Windows bind them. --Mike Champion
Received on Wednesday, 4 February 2004 22:52:24 UTC