W3C home > Mailing lists > Public > www-international@w3.org > July to September 2012

RE: pre-HTML5 and the BOM

From: Phillips, Addison <addison@lab126.com>
Date: Wed, 18 Jul 2012 09:56:56 -0700
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC3476AAD05D206@EX-SEA31-D.ant.amazon.com>
Hello Leif (et al),

I agree that the Unicode list isn't the right place to talk about W3C content, I am thus cross-posting this to the www-international@w3.org list in this note (with Unicode.org on blind-copy to prevent list spamming). By discussing this on www-international, the Internationalization Working Group will be better positioned to respond. Note also that each of the pages you cite has a "comments" feature on the page itself ;-). We do revisit these documents from time to time, as what is best to recommend does evolve as standards and implementations change.

I would also point out that the pages you've cited, in general, continue to mark best practice on the Web as the Internationalization WG understands it. It is best to use a Unicode character encoding (generally UTF-8). It is better to use the character encoding directly than it is to use escapes or entities. And it is best to avoid the BOM when one has a choice. We do need to remove misinformation, such as the "three bytes of mojibake garbage" discussion, as this is now obsolete when it comes to browsers.

Regarding:

> > [2]
> > http://www.w3.org/International/questions/qa-byte-order-mark#bomhow


The WG discussed this document in our teleconference today, as it happens [1], and work is already underway to update this page. However, the WG still seems to feel that the Byte Order Mark is better to avoid when possible, even if it is not the barrier to display or interoperability that it once was. 

I do note that BOM and NCR/entities are (or at least should be) separate considerations. Using a BOM as en encoding signature and then escaping it is an absurd thing to do. FWIW, I also agree with Martin's comment:

>> I'm not sure there are many people for whom using named character 
>> entities or numeric character references is a convenience. But for 
>> those for whom it is a convenience, let them use it.

Addison

[1] http://www.w3.org/2012/07/18-i18n-minutes.html


Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Wednesday, 18 July 2012 16:57:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 18 July 2012 16:57:25 GMT