- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Sat, 1 Mar 2008 11:39:41 +0100
- To: www-archive@w3.org
Geoffrey Sneddon wrote: >> "In particular, whenever a data stream is declared to be UTF-16BE, >> UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used." >> If somebody wants to include a zero-width non-breaking space >> (ZWNBSP) at the beginning of a stream, they have to use U+2060 WORD >> JOINER instead. > Could you possibly give me a pointer to something in the Unicode > standard that requires that? I've never seen such a requirement. TUS 5.0 chapter 3.10, D96: "In UTF-16BE an initial byte sequence <FE FF> is interpreted as U+FEFF ZERO WIDTH NON-BREAK SPACE." D97 is the corresponding <FF FE> definition for UTF-16LE. D98 explains that an initial <FE FF> or <FF FE> is a BOM. D99, D100, and D101 are for UTF32-BE, UTF32-LE, and UTF-32. Chapter 16.8 notes that WORD JOINER should be used for what the name says instead of ZWNBSP. Chapter 16.2 states that WORD JOINER is strongly preferred in comparison with ZWNBSP. For a summary see table 2.4 in chapter 2.6, it says "BOM allowed: yes" for UTF-8, UTF-16, and-32, and it says "no" for UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE. Check out C11 in chapter 3, not exactly clear from my POV. For better definitions with MUST and MUST NOT see RFC 2781, this RFC is the normative text for the IANA registrations. Frank
Received on Saturday, 1 March 2008 10:38:41 UTC