Re: Proposed addition to Display problems caused by the UTF-8 BOM

 > This can save time if the only non-ASCII characters occur a long way
 > down the file

It isn't time savings that's really in question here. In fact, the lack 
of a BOM causes editors like Notepad to use the currently active default 
encoding. They don't look at *any* of the rest of the file. The use of 
BOM as a signature is related to this bit of text already in the FAQ:

--
You will find that some text editors such as Windows Notepad will 
automatically add a UTF-8 signature to any file you save as UTF-8.
--

So I would tend to replace the bit above thusly:

--
Some applications, such as text editors, look for the BOM as a signature 
indicating the use of a Unicode encoding. These applications, such as 
Windows Notepad, will automatically add a UTF-8 BOM to any file you save 
as UTF-8 so that they can detect it later. Browsers, however, don't look 
for the BOM and Web pages always need to declare the character encoding 
explicitly at the top of the file or in the HTTP header, making a BOM 
unnecessary (and, as noted above, sometimes harmful).
--

Just a thought.

Addison

Richard Ishida wrote:
> Chaps,
> 
> I propose to add the following paragraph to http://www.w3.org/International/questions/qa-utf8-bom in the section By the Way:
> 
> "Applications that look at the text to work out the 

character encoding can tell straight away that the text is encoded in 
UTF-8 if they find a BOM at the beginning.

This can save time if the only non-ASCII characters occur a long way 
down the file (such as a copyright symbol in text at the very end).  Web 
pages, however, ought to declare the character encoding explicitly at 
the top of the file or in the HTTP header, so a BOM should not be 
necessary."
> 
> Unless I hear any objections, I will make the change, unannounced, in a couple of days time.
> 
> Cheers,
> RI
> 
> 
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>  
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
>  
> 
> 



Richard Ishida wrote:
> Chaps,
> 
> I propose to add the following paragraph to http://www.w3.org/International/questions/qa-utf8-bom in the section By the Way:
> 
> "Applications that look at the text to work out the character encoding can tell straight away that the text is encoded in UTF-8 if they find a BOM at the beginning.  This can save time if the only non-ASCII characters occur a long way down the file (such as a copyright symbol in text at the very end).  Web pages, however, ought to declare the character encoding explicitly at the top of the file or in the HTTP header, so a BOM should not be necessary."
> 
> Unless I hear any objections, I will make the change, unannounced, in a couple of days time.
> 
> Cheers,
> RI
> 
> 
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>  
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
>  
> 
> 

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG

Internationalization is an architecture.
It is not a feature.

Received on Wednesday, 25 July 2007 16:01:17 UTC