Re: byte order mark article

Anne van Kesteren scripsit:

> * Per my reading of the HTML specification you can use utf-16le and
> utf-16be without a BOM. It does not even require it for utf-16,
> although I suppose Unicode might (though Unicode is not very correct
> here with respect to what implementations do). 

Per Unicode, in UTF-16LE and UTF-16BE documents, there is no such
thing as a BOM.  If a UTF-16LE document begins FF FE, that means the
first character is U+FEFF, ZERO BASED NON-BREAKING SPACE; likewise if
a UTF-16BE document begins FE FF.

In the UTF-16 encoding, a leading FF FE or FE FF is a BOM rather than a
character, and all following pairs of bytes are interpreted little-endian
or big-endian respectively.  If the first two bytes are neither of these,
a higher-level protocol must decide whether to interpret the pairs of
bytes as big- or little-endian.  If no higher-level protocol exists,
the interpretation is big-endian by default.

-- 
Unless it was by accident that I had            John Cowan
offended someone, I never apologized.           cowan@ccil.org
        --Quentin Crisp                         http://www.ccil.org/~cowan

Received on Wednesday, 21 November 2012 21:27:49 UTC