- From: Najib Tounsi <ntounsi@emi.ac.ma>
- Date: Sun, 29 Jul 2007 10:04:07 +0000
- To: public-i18n-core@w3.org
Najib Tounsi wrote: > > +1 > > Although Notepad is the default editor of almost 90% of PCs, since > most of the PCs are equipped with Windows. I've had bad experience > with Notepad because of BOM (lost of original content created by > other editors) . > > What I would like to add here (of course, nothing to do with > http://www.w3.org/International/questions/qa-utf8-bom) is that there > is a similar problem with "some" HTML authoring tools and the charset > meta tag. Example: 1- You create your text containing some Arabic > with NVU. 2- You save your HTML file (it is saved as utf-8 encoding). > 3- You reopen your file later with your favorite tool NVU. The > content is then interpreted as ISO Latin, and the bytes (e.g. اي) > are converted to there HTML entity equivalence (e.g. > اي) > > Indeed, the file is saved with a meta tag indicating the default > content-type of ISO-8859-1. <meta content="text/html; > charset=ISO-8859-1" http-equiv="content-type" /> > > The next opening of the file, NVU reinterprets the file as being ISO > Latin and converts all strange bytes to an HTML entity. > > The solution is to change the charset in the meta tag before saving > (the same problem occurs when you delete the meta tag), or to use > text editor to change the charset from ISO-8859-1 to utf-8, before > re-opening with NVU. Of course, you can change the default setting from ISO-8859-1, to utf-8. But you might want to keep the original setting for general purpose. > > Best, Najib > > > Martin Duerst wrote: > > At 00:59 07/07/26, Addison Phillips wrote: > > > >> So I would tend to replace the bit above thusly: > >> > >> -- Some applications, such as text editors, look for the BOM as a > >> signature indicating the use of a Unicode encoding. These > >> applications, such as Windows Notepad, will automatically add a > >> UTF-8 BOM to any file you save as UTF-8 so that they can detect > >> it later. Browsers, however, don't look for the BOM and Web pages > >> always need to declare the character encoding explicitly at the > >> top of the file or in the HTTP header, making a BOM unnecessary > >> (and, as noted above, sometimes harmful). -- > > > > I think this is a good direction, but I'm a bit worried by "such as > > text editors". This implies that all or most text editors silently > > add a BOM, which is not true. I would change "such as text > > editors" to "such as some text editors". > > > > Also, the "Browsers, however," is a bit of a problem, because it's > > written as a counterpoint to editors. So I'd rewrite that part a > > bit, too. > > > > Regards, Martin. > > > > > >> Just a thought. > >> > >> Addison > >> > >> Richard Ishida wrote: > >>> Chaps, I propose to add the following paragraph to > >>> http://www.w3.org/International/questions/qa-utf8-bom in the > >>> section By the Way: "Applications that look at the text to work > >>> out the > >> character encoding can tell straight away that the text is > >> encoded in UTF-8 if they find a BOM at the beginning. > >> > >> This can save time if the only non-ASCII characters occur a long > >> way down the file (such as a copyright symbol in text at the very > >> end). Web pages, however, ought to declare the character > >> encoding explicitly at the top of the file or in the HTTP header, > >> so a BOM should not be necessary." > >>> Unless I hear any objections, I will make the change, > >>> unannounced, in a couple of days time. Cheers, RI > >>> > >>> ============ Richard Ishida Internationalization Lead W3C > >>> (World Wide Web Consortium) > >>> > >>> http://www.w3.org/People/Ishida/ > >>> http://www.w3.org/International/ > >>> http://people.w3.org/rishida/blog/ > >>> http://www.flickr.com/photos/ishida/ > >>> > >>> > >> > >> > >> Richard Ishida wrote: > >>> Chaps, I propose to add the following paragraph to > >>> http://www.w3.org/International/questions/qa-utf8-bom in the > >>> section By the Way: "Applications that look at the text to work > >>> out the character encoding can tell straight away that the > >>> text is encoded in UTF-8 if they find a BOM at the beginning. > >>> This can save time if the only non-ASCII characters occur a > >>> long way down the file (such as a copyright symbol in text at > >>> the very end). Web pages, however, ought to declare the > >>> character encoding explicitly at the top of the file or in the > >>> HTTP header, so a BOM should not be necessary." Unless I hear > >>> any objections, I will make the change, unannounced, in a > >>> couple of days time. Cheers, RI > >>> > >>> ============ Richard Ishida Internationalization Lead W3C > >>> (World Wide Web Consortium) > >>> > >>> http://www.w3.org/People/Ishida/ > >>> http://www.w3.org/International/ > >>> http://people.w3.org/rishida/blog/ > >>> http://www.flickr.com/photos/ishida/ > >>> > >>> > >> -- Addison Phillips Globalization Architect -- Yahoo! Inc. Chair > >> -- W3C Internationalization Core WG > >> > >> Internationalization is an architecture. It is not a feature. > >> > > > > > > #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University > > #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp > > > > > > > > -- Najib TOUNSI (mailto:tounsi @ w3.org) Bureau W3C au Maroc (http://www.w3c.org.ma/) Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco) Phone : +212 (0) 37 68 71 50 (P1711) Fax : +212 (0) 37 77 88 53 Mobile: +212 (0) 61 22 00 30
Received on Sunday, 29 July 2007 10:04:25 UTC