Re: Proposed addition to Display problems caused by the UTF-8 BOM

Najib Tounsi wrote:
>
>  +1
>
>  Although Notepad is the default editor of almost 90% of PCs, since
>  most of the PCs are equipped with Windows. I've had bad experience
>  with Notepad because of BOM (lost of original content created by
>  other editors) .
>
>  What I would like to add here (of course, nothing to do with
>  http://www.w3.org/International/questions/qa-utf8-bom) is that there
>  is a similar problem with "some" HTML authoring tools and the charset
>  meta tag. Example: 1- You create your text containing some Arabic
>  with NVU. 2- You save your HTML file (it is saved as utf-8 encoding).
>  3- You reopen your file later with your favorite tool NVU. The
>  content is then interpreted as ISO Latin, and the bytes (e.g. اي)
>  are converted to there HTML entity equivalence (e.g.
>  اي)
>
>  Indeed, the file is saved with a meta tag indicating the default
>  content-type of ISO-8859-1. <meta content="text/html;
>  charset=ISO-8859-1" http-equiv="content-type" />
>
>  The next opening of the file, NVU reinterprets the file as being ISO
>  Latin and converts all strange bytes to an HTML entity.
>
>  The solution is to change the charset in the meta tag before saving
>  (the same problem occurs when you delete the meta tag), or to use
>  text editor to change the charset from ISO-8859-1 to utf-8, before
>  re-opening with NVU.

Of course, you can change the default setting from ISO-8859-1, to utf-8. 
But you might want to keep the original setting for general purpose.

>
>  Best, Najib
>
>
>  Martin Duerst wrote:
> > At 00:59 07/07/26, Addison Phillips wrote:
> >
> >> So I would tend to replace the bit above thusly:
> >>
> >> -- Some applications, such as text editors, look for the BOM as a
> >>  signature indicating the use of a Unicode encoding. These
> >> applications, such as Windows Notepad, will automatically add a
> >> UTF-8 BOM to any file you save as UTF-8 so that they can detect
> >> it later. Browsers, however, don't look for the BOM and Web pages
> >>  always need to declare the character encoding explicitly at the
> >> top of the file or in the HTTP header, making a BOM unnecessary
> >> (and, as noted above, sometimes harmful). --
> >
> > I think this is a good direction, but I'm a bit worried by "such as
> >  text editors". This implies that all or most text editors silently
> >  add a BOM, which is not true. I would change "such as text
> > editors" to "such as some text editors".
> >
> > Also, the "Browsers, however," is a bit of a problem, because it's
> > written as a counterpoint to editors. So I'd rewrite that part a
> > bit, too.
> >
> > Regards, Martin.
> >
> >
> >> Just a thought.
> >>
> >> Addison
> >>
> >> Richard Ishida wrote:
> >>> Chaps, I propose to add the following paragraph to
> >>> http://www.w3.org/International/questions/qa-utf8-bom in the
> >>> section By the Way: "Applications that look at the text to work
> >>>  out the
> >> character encoding can tell straight away that the text is
> >> encoded in UTF-8 if they find a BOM at the beginning.
> >>
> >> This can save time if the only non-ASCII characters occur a long
> >> way down the file (such as a copyright symbol in text at the very
> >>  end). Web pages, however, ought to declare the character
> >> encoding explicitly at the top of the file or in the HTTP header,
> >> so a BOM should not be necessary."
> >>> Unless I hear any objections, I will make the change,
> >>> unannounced, in a couple of days time. Cheers, RI
> >>>
> >>> ============ Richard Ishida Internationalization Lead W3C
> >>> (World Wide Web Consortium)
> >>>
> >>> http://www.w3.org/People/Ishida/
> >>> http://www.w3.org/International/
> >>> http://people.w3.org/rishida/blog/
> >>> http://www.flickr.com/photos/ishida/
> >>>
> >>>
> >>
> >>
> >> Richard Ishida wrote:
> >>> Chaps, I propose to add the following paragraph to
> >>> http://www.w3.org/International/questions/qa-utf8-bom in the
> >>> section By the Way: "Applications that look at the text to work
> >>>  out the character encoding can tell straight away that the
> >>> text is encoded in UTF-8 if they find a BOM at the beginning.
> >>> This can save time if the only non-ASCII characters occur a
> >>> long way down the file (such as a copyright symbol in text at
> >>> the very end). Web pages, however, ought to declare the
> >>> character encoding explicitly at the top of the file or in the
> >>> HTTP header, so a BOM should not be necessary." Unless I hear
> >>> any objections, I will make the change, unannounced, in a
> >>> couple of days time. Cheers, RI
> >>>
> >>> ============ Richard Ishida Internationalization Lead W3C
> >>> (World Wide Web Consortium)
> >>>
> >>> http://www.w3.org/People/Ishida/
> >>> http://www.w3.org/International/
> >>> http://people.w3.org/rishida/blog/
> >>> http://www.flickr.com/photos/ishida/
> >>>
> >>>
> >> -- Addison Phillips Globalization Architect -- Yahoo! Inc. Chair
> >> -- W3C Internationalization Core WG
> >>
> >> Internationalization is an architecture. It is not a feature.
> >>
> >
> >
> > #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> > #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
> >
> >
> >
>
>


-- 
Najib TOUNSI (mailto:tounsi @ w3.org)
Bureau W3C au Maroc (http://www.w3c.org.ma/)
Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
Phone : +212 (0) 37 68 71 50 (P1711)  Fax : +212 (0) 37 77 88 53
Mobile: +212 (0) 61 22 00 30

Received on Sunday, 29 July 2007 10:04:25 UTC