W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

RE: byte order mark article

From: Phillips, Addison <addison@lab126.com>
Date: Wed, 21 Nov 2012 15:40:02 -0800
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, John Cowan <cowan@mercury.ccil.org>
CC: Anne van Kesteren <annevk@annevk.nl>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC34773A8999807@EX-SEA31-D.ant.amazon.com>
> If the above is an accurate reflection of what Unicode says, then it doesn’t
> sound as if it is considered as very safe to let a leading FF FE/FE FF for anything
> but the BOM - not even when using UTF-16LE/UTF-16BE.

The use of U+FEFF as anything other than a Unicode signature is already deprecated. In fact, Unicode created the Zero Width Joiner character to replace BOM's other "identity" of "zero width non-breaking space". To wit, in the Standard, section 16.2 says:

Zero Width No-Break Space. In addition to its primary meaning of byte order mark (see
“Byte Order Mark” in Section 16.8, Specials), the code point U+FEFF possesses the semantics
of zero width no-break space, which matches that of word joiner. Until Unicode 3.2,
U+FEFF was the only code point with word joining semantics, but because it is more commonly
used as byte order mark, the use of U+2060 word joiner to indicate word joining is
strongly preferred for any new text. Implementations should continue to support the word
joining semantics of U+FEFF for backward compatibility.


Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Wednesday, 21 November 2012 23:40:51 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:33 UTC