W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

Re: byte order mark article

From: Richard Ishida <ishida@w3.org>
Date: Thu, 22 Nov 2012 18:00:02 +0000
Message-ID: <50AE6822.2070608@w3.org>
To: Anne van Kesteren <annevk@annevk.nl>
CC: www-international@w3.org
Useful comments. Thanks Anne. Just so that it's clear, this is still an 
early draft, and this type of comment is exactly what I'm looking for at 
this point.

The article has been updated 

See below..

On 21/11/2012 21:04, Anne van Kesteren wrote:
> I saw http://www.w3.org/International/questions/new/qa-byte-order-mark-new
> in the minutes.
> * This article mentions utf-16 a lot. Given the pain utf-16 causes
> being the only non-ASCII-compatible encoding user agent implementors
> have to care about and that there's even talk about maybe trying to
> get rid of it completely, featuring it so prominently seems unwise.
> You might want to get Henri's view on this too.

Yes, I have moved that down to the 'By the way' section.

Although I added a strong recommendation not to use UTF-16, and buried 
the text about UTF-16 a bit more, I think we do need to keep the text 
hanging around somewhere for now.

> * Are there even non-recent versions of major browsers that do not
> handle the byte order mark? How far back do we have to go these days?
> * Per my reading of the HTML specification you can use utf-16le and
> utf-16be without a BOM.

Actually, RFC2781 says that you MUST NOT use a BOM with content labelled 
as utf-16le/utf-16be.  (Of course, as mentioned in the side note int he 
article, this is about labeling rather than the sequence of bytes at the 
start of the file.)

It does not even require it for utf-16,

This was news to me. I believe HTML5 did the last time I looked. I've 
made several changes to reflect this.

> although I suppose Unicode might (though Unicode is not very correct
> here with respect to what implementations do). So the section "If you
> use UTF-16" seems wrong.
> * "According to the HTML specification, the HTTP header overrides any
> in-document encoding." is no longer true.

Another change I had missed. I have made relevant changes to the article.

> * "A UTF-8 signature at the beginning of a CSS file can sometimes
> cause the initial rules in the file to fail on certain user agents."
> citation needed? :-)
> * If you really have to mention utf-32, you might also want to point
> out it has been actively removed from implementations so using it is
> unlikely to be productive.

Received on Thursday, 22 November 2012 18:00:27 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:33 UTC