RE: byte order mark article

> If the above is an accurate reflection of what Unicode says, then it doesn’t
> sound as if it is considered as very safe to let a leading FF FE/FE FF for anything
> but the BOM - not even when using UTF-16LE/UTF-16BE.

The use of U+FEFF as anything other than a Unicode signature is already deprecated. In fact, Unicode created the Zero Width Joiner character to replace BOM's other "identity" of "zero width non-breaking space". To wit, in the Standard, section 16.2 says:

--
Zero Width No-Break Space. In addition to its primary meaning of byte order mark (see
“Byte Order Mark” in Section 16.8, Specials), the code point U+FEFF possesses the semantics
of zero width no-break space, which matches that of word joiner. Until Unicode 3.2,
U+FEFF was the only code point with word joining semantics, but because it is more commonly
used as byte order mark, the use of U+2060 word joiner to indicate word joining is
strongly preferred for any new text. Implementations should continue to support the word
joining semantics of U+FEFF for backward compatibility.
--

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Wednesday, 21 November 2012 23:40:51 UTC