W3C home > Mailing lists > Public > www-validator@w3.org > December 2006

RE: Strange advice re BOM and UTF-8

From: McDonald, Ira <imcdonald@sharplabs.com>
Date: Wed, 6 Dec 2006 09:00:31 -0800
Message-ID: <789E617C880666438EDEE30C2A3E8D100105A585@mailsrvnt05.enet.sharplabs.com>
To: 'Richard Ishida' <ishida@w3.org>, 'Chris Lilley' <chris@w3.org>, www-validator@w3.org
Cc: www-international@w3.org

Hi,

FWIW - the IETF's formal definition of UTF-8 (RFC 3629)
recommends very strongly AGAINST the use of BOM in UTF-8
in all IETF protocols because:

(a) it's useless as a signature (a small fragment of 
    UTF-8 can be reliably auto-detected without BOM);
(b) it's dangerous because it breaks string concatenation.

Cheers,
- Ira

Ira McDonald (Musician / Software Architect)
Chair - FSG Open Printing Steering Committee
Blue Roof Music / High North Inc
PO Box 221  Grand Marais, MI  49839
phone: +1-906-494-2434
email: imcdonald@sharplabs.com

-----Original Message-----
From: www-international-request@w3.org
[mailto:www-international-request@w3.org]On Behalf Of Richard Ishida
Sent: Wednesday, December 06, 2006 10:44 AM
To: 'Richard Ishida'; 'Chris Lilley'; www-validator@w3.org
Cc: www-international@w3.org
Subject: RE: Strange advice re BOM and UTF-8



I just checked, and the blank line in the PHP file appears in IE7 too.

RI


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/
 

> -----Original Message-----
> From: www-international-request@w3.org 
> [mailto:www-international-request@w3.org] On Behalf Of Richard Ishida
> Sent: 06 December 2006 15:39
> To: 'Chris Lilley'; www-validator@w3.org
> Cc: www-international@w3.org
> Subject: RE: Strange advice re BOM and UTF-8
> 
> 
> None of the things you say are incorrect, and it would be 
> nice to be able to say that it's ok to use the utf-8 
> signature, however, some applications - such as a text editor 
> or a browser - have been known to display the BOM as an extra 
> line in the file, others will display unexpected characters, 
> such as i>?.
> 
> Note that the wording refers to problems caused by user 
> agents when displaying text with signature.  
> 
> It might be worth testing whether this si still generally the 
> case, however, or whether applications have indeed improved 
> significantly in the last year or so.
> 
> I have a test at
> http://www.w3.org/International/tests/sec-utf8-signature-1.htm
l which seems to indicate that the latest versions of IE, > Firefox and
Opera on Windows cope ok with the utf-8 signature 
> in embedded files.  I have seen this problem recently, 
> however, in files included into PHP that have the signature.  
> I have temporarily created an example at 
> http://www.w3.org/International/questions/qa-css-charset.vi.ph
p  (I will fix this tomorrow.)  Look at it in Firefox, and it is > fine -
look at it in IE6, and there's a blank line at the top 
> of the page. (compare the IE page with one of the other 
> translations of the same article) (The bom is in an included file.)
> 
> RI
> 
> 
> 
> 
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
> 
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
>  
> 
> > -----Original Message-----
> > From: www-international-request@w3.org 
> > [mailto:www-international-request@w3.org] On Behalf Of Chris Lilley
> > Sent: 06 December 2006 14:35
> > To: www-validator@w3.org
> > Cc: www-international@w3.org
> > Subject: Strange advice re BOM and UTF-8
> > 
> > 
> > Hello www-validator,
> > 
> > I was surprised to see, on the W3C DTD validator, the following 
> > advice:
> > 
> >   The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files 
> is known to
> >   cause problems for some text editors and older browsers. You may
> >   want to consider avoiding its use until it is better supported.
> > 
> > This is odd because the use of a BOM with UTF-8 files is
> > 
> > a) standards compliant, to Unicode and to XML and to CSS
> > b) common practice
> > c) allows text editors to auto-detect the encoding of a plain text 
> > document.
> > 
> > I believe therefore that the advice is incorrect and indeed 
> > potentially damaging.
> > 
> > 
> > -- 
> >  Chris Lilley                    mailto:chris@w3.org
> >  Interaction Domain Leader
> >  Co-Chair, W3C SVG Working Group
> >  W3C Graphics Activity Lead
> >  Co-Chair, W3C Hypertext CG
> > 
> > 
> 
> 


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.11/575 - Release Date: 12/6/2006
12:22 PM
 
Received on Wednesday, 6 December 2006 17:01:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:23 GMT