W3C home > Mailing lists > Public > www-validator@w3.org > December 2006

Re: Strange advice re BOM and UTF-8

From: olivier Thereaux <ot@w3.org>
Date: Thu, 7 Dec 2006 00:09:31 +0900
Message-Id: <7FC489D8-5A55-4986-8EB5-2639B1416C25@w3.org>
Cc: www-validator@w3.org, www-international@w3.org
To: Chris Lilley <chris@w3.org>

Hi Chris,

On Dec 6, 2006, at 23:35 , Chris Lilley wrote:
> I was surprised to see, on the W3C DTD validator, the following  
> advice:
>
>   The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to
>   cause problems for some text editors and older browsers. You may
>   want to consider avoiding its use until it is better supported.
>
> This is odd because the use of a BOM with UTF-8 files is
>
> a) standards compliant, to Unicode and to XML and to CSS
> b) common practice
> c) allows text editors to auto-detect the encoding of a plain text
> document.
>
> I believe therefore that the advice is incorrect and indeed
> potentially damaging.

I am not an expert so all my knowledge about UTF-8 with BOM comes  
from hearsay and some documentation I have read, and the picture I  
was having so far was pointing toward the fact that the BOM for utf-8  
was not very necessary (it is only a signature, not a mention of byte  
order, isn't it?), and indeed sometimes (although perhaps more and  
more rarely) harmful because of implementations that do not  
understand the mark.

Docs I know include:
http://www.w3.org/International/questions/qa-utf8-bom
http://unicode.org/unicode/faq/utf_bom.html#BOM
and both seem to point towards a cautious usage of a BOM for utf-8,  
or no usage at all

Do you have other references worth reading on the topic?

Thank you.

-- 
olivier
Received on Wednesday, 6 December 2006 15:09:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:23 GMT