- From: John Cowan <cowan@mercury.ccil.org>
- Date: Wed, 21 Nov 2012 16:27:27 -0500
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: www-international@w3.org
Anne van Kesteren scripsit:
> * Per my reading of the HTML specification you can use utf-16le and
> utf-16be without a BOM. It does not even require it for utf-16,
> although I suppose Unicode might (though Unicode is not very correct
> here with respect to what implementations do).
Per Unicode, in UTF-16LE and UTF-16BE documents, there is no such
thing as a BOM. If a UTF-16LE document begins FF FE, that means the
first character is U+FEFF, ZERO BASED NON-BREAKING SPACE; likewise if
a UTF-16BE document begins FE FF.
In the UTF-16 encoding, a leading FF FE or FE FF is a BOM rather than a
character, and all following pairs of bytes are interpreted little-endian
or big-endian respectively. If the first two bytes are neither of these,
a higher-level protocol must decide whether to interpret the pairs of
bytes as big- or little-endian. If no higher-level protocol exists,
the interpretation is big-endian by default.
--
Unless it was by accident that I had John Cowan
offended someone, I never apologized. cowan@ccil.org
--Quentin Crisp http://www.ccil.org/~cowan
Received on Wednesday, 21 November 2012 21:27:49 UTC