RE: pre-HTML5 and the BOM

Hello Leif (et al),

I agree that the Unicode list isn't the right place to talk about W3C content, I am thus cross-posting this to the list in this note (with on blind-copy to prevent list spamming). By discussing this on www-international, the Internationalization Working Group will be better positioned to respond. Note also that each of the pages you cite has a "comments" feature on the page itself ;-). We do revisit these documents from time to time, as what is best to recommend does evolve as standards and implementations change.

I would also point out that the pages you've cited, in general, continue to mark best practice on the Web as the Internationalization WG understands it. It is best to use a Unicode character encoding (generally UTF-8). It is better to use the character encoding directly than it is to use escapes or entities. And it is best to avoid the BOM when one has a choice. We do need to remove misinformation, such as the "three bytes of mojibake garbage" discussion, as this is now obsolete when it comes to browsers.


> > [2]
> >

The WG discussed this document in our teleconference today, as it happens [1], and work is already underway to update this page. However, the WG still seems to feel that the Byte Order Mark is better to avoid when possible, even if it is not the barrier to display or interoperability that it once was. 

I do note that BOM and NCR/entities are (or at least should be) separate considerations. Using a BOM as en encoding signature and then escaping it is an absurd thing to do. FWIW, I also agree with Martin's comment:

>> I'm not sure there are many people for whom using named character 
>> entities or numeric character references is a convenience. But for 
>> those for whom it is a convenience, let them use it.



Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Wednesday, 18 July 2012 16:57:24 UTC