- From: Brian Smith <brian@briansmith.org>
- Date: Fri, 29 Feb 2008 08:54:55 -0800
- To: "'HTML WG'" <public-html@w3.org>
Geoffrey Sneddon wrote: > On 29 Feb 2008, at 13:38, Brian Smith wrote: > > If somebody wants to include a zero-width non-breaking space > > (ZWNBSP) at the beginning of a stream, they have to use U+2060 WORD > > JOINER instead. > > Could you possibly give me a pointer to something in the > Unicode standard that requires that? I've never seen such a > requirement. See 16.8 Specials: "For compatibility with versions of the Unicode Standard prior to Version 3.2, the code point U+FEFF has the word-joining semantics of zero width no-break space when it is not used as a BOM. In new text, these semantics should be encoded by U+2060 word joiner." But, if you do want to use U+FEFF anyway, and you are not using -BE or -LE, then: "To represent an initial U+FEFF zero width no-break space in a UTF-16 file, use U+FEFF twice in a row. The first one is a byte order mark; the second one is the initial zero width no-break space. See Table 16-4 for a summary of encoding scheme signatures." But: "Where the byte order is explicitly specified, such as in UTF-16BE or UTF-16LE, then all U+FEFF characters-even at the very beginning of the text-are to be interpreted as zero width no-break spaces." So, an initial U+FEFF is never an error, even for the -BE and -LE variants. But, in -BE and -LE, it isn't a BOM, but a ZWNBSP. And, also, producers of documents should never use U+FEFF anywhere in the document unless it is used as a BOM, which by definition can't exist in a -BE/-LE document. - Brian
Received on Friday, 29 February 2008 16:55:08 UTC