- From: Addison Phillips <addison@yahoo-inc.com>
- Date: Wed, 25 Jul 2007 08:59:40 -0700
- To: Richard Ishida <ishida@w3.org>
- CC: public-i18n-core@w3.org
> This can save time if the only non-ASCII characters occur a long way > down the file It isn't time savings that's really in question here. In fact, the lack of a BOM causes editors like Notepad to use the currently active default encoding. They don't look at *any* of the rest of the file. The use of BOM as a signature is related to this bit of text already in the FAQ: -- You will find that some text editors such as Windows Notepad will automatically add a UTF-8 signature to any file you save as UTF-8. -- So I would tend to replace the bit above thusly: -- Some applications, such as text editors, look for the BOM as a signature indicating the use of a Unicode encoding. These applications, such as Windows Notepad, will automatically add a UTF-8 BOM to any file you save as UTF-8 so that they can detect it later. Browsers, however, don't look for the BOM and Web pages always need to declare the character encoding explicitly at the top of the file or in the HTTP header, making a BOM unnecessary (and, as noted above, sometimes harmful). -- Just a thought. Addison Richard Ishida wrote: > Chaps, > > I propose to add the following paragraph to http://www.w3.org/International/questions/qa-utf8-bom in the section By the Way: > > "Applications that look at the text to work out the character encoding can tell straight away that the text is encoded in UTF-8 if they find a BOM at the beginning. This can save time if the only non-ASCII characters occur a long way down the file (such as a copyright symbol in text at the very end). Web pages, however, ought to declare the character encoding explicitly at the top of the file or in the HTTP header, so a BOM should not be necessary." > > Unless I hear any objections, I will make the change, unannounced, in a couple of days time. > > Cheers, > RI > > > ============ > Richard Ishida > Internationalization Lead > W3C (World Wide Web Consortium) > > http://www.w3.org/People/Ishida/ > http://www.w3.org/International/ > http://people.w3.org/rishida/blog/ > http://www.flickr.com/photos/ishida/ > > > Richard Ishida wrote: > Chaps, > > I propose to add the following paragraph to http://www.w3.org/International/questions/qa-utf8-bom in the section By the Way: > > "Applications that look at the text to work out the character encoding can tell straight away that the text is encoded in UTF-8 if they find a BOM at the beginning. This can save time if the only non-ASCII characters occur a long way down the file (such as a copyright symbol in text at the very end). Web pages, however, ought to declare the character encoding explicitly at the top of the file or in the HTTP header, so a BOM should not be necessary." > > Unless I hear any objections, I will make the change, unannounced, in a couple of days time. > > Cheers, > RI > > > ============ > Richard Ishida > Internationalization Lead > W3C (World Wide Web Consortium) > > http://www.w3.org/People/Ishida/ > http://www.w3.org/International/ > http://people.w3.org/rishida/blog/ > http://www.flickr.com/photos/ishida/ > > > -- Addison Phillips Globalization Architect -- Yahoo! Inc. Chair -- W3C Internationalization Core WG Internationalization is an architecture. It is not a feature.
Received on Wednesday, 25 July 2007 16:01:17 UTC