Re: byte order mark article

Richard Ishida, Thu, 22 Nov 2012 18:00:02 +0000:
> On 21/11/2012 21:04, Anne van Kesteren wrote:

>> * Are there even non-recent versions of major browsers that do not
>> handle the byte order mark? How far back do we have to go these days?
>> 
>> * Per my reading of the HTML specification you can use utf-16le and
>> utf-16be without a BOM.
> 
> Actually, RFC2781 says that you MUST NOT use a BOM with content 
> labelled as utf-16le/utf-16be.  (Of course, as mentioned in the side 
> note int he article, this is about labeling rather than the sequence 
> of bytes at the start of the file.)
> 
> It does not even require it for utf-16,

Right:[1]

]]  MUST label the text as
   "UTF-16", and SHOULD make sure the text starts with 0xFEFF.[[

The perceived requirement to use the BOM might stem from XML: [1]

]] An exception to the "SHOULD" rule of using "UTF-16BE" or "UTF-16LE"
   would occur with document formats that mandate a BOM in UTF-16 text,
   thereby requiring the use of the "UTF-16" tag only.[[

> This was news to me. I believe HTML5 did the last time I looked. I've 
> made several changes to reflect this.

And then the Encoding Standard confusingly says:[2] "In violation of 
the Unicode standard, "utf-16" is a label for utf-16le rather than its 
own standalone encoding." Which gives the impression that the Encoding 
Standard *only* switches the labels. However, in the Encoding Standard, 
the terms 'UTF-16LE'/'UTF-16BE' also covers the user of the BOM - which 
the 'UTF-16LE'/'UTF-16BE' labels per "the other standards" do not cover.

I would suggest to Anne that the Encoding Standard should make clear 
that the terms 'UTF-16LE'/'UTF-16BE' are in violation with the existing 
definitions of hose labels.
 
[1] http://tools.ietf.org/html/rfc2781#section-3.3

[2] http://encoding.spec.whatwg.org/#utf-16le

-- 
leif halvard silli

Received on Thursday, 22 November 2012 22:35:28 UTC