W3C home > Mailing lists > Public > www-international@w3.org > October to December 2006

RE: Strange advice re BOM and UTF-8

From: Richard Ishida <ishida@w3.org>
Date: Wed, 6 Dec 2006 15:39:21 -0000
To: "'Chris Lilley'" <chris@w3.org>, <www-validator@w3.org>
Cc: <www-international@w3.org>
Message-ID: <012501c7194c$b38f4b70$6501a8c0@w3cishida>

None of the things you say are incorrect, and it would be nice to be able to
say that it's ok to use the utf-8 signature, however, some applications -
such as a text editor or a browser - have been known to display the BOM as
an extra line in the file, others will display unexpected characters, such
as i>?.

Note that the wording refers to problems caused by user agents when
displaying text with signature.  

It might be worth testing whether this si still generally the case, however,
or whether applications have indeed improved significantly in the last year
or so.

I have a test at
http://www.w3.org/International/tests/sec-utf8-signature-1.html which seems
to indicate that the latest versions of IE, Firefox and Opera on Windows
cope ok with the utf-8 signature in embedded files.  I have seen this
problem recently, however, in files included into PHP that have the
signature.  I have temporarily created an example at
http://www.w3.org/International/questions/qa-css-charset.vi.php  (I will fix
this tomorrow.)  Look at it in Firefox, and it is fine - look at it in IE6,
and there's a blank line at the top of the page. (compare the IE page with
one of the other translations of the same article) (The bom is in an
included file.)


Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)


> -----Original Message-----
> From: www-international-request@w3.org 
> [mailto:www-international-request@w3.org] On Behalf Of Chris Lilley
> Sent: 06 December 2006 14:35
> To: www-validator@w3.org
> Cc: www-international@w3.org
> Subject: Strange advice re BOM and UTF-8
> Hello www-validator,
> I was surprised to see, on the W3C DTD validator, the 
> following advice:
>   The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to
>   cause problems for some text editors and older browsers. You may
>   want to consider avoiding its use until it is better supported.
> This is odd because the use of a BOM with UTF-8 files is
> a) standards compliant, to Unicode and to XML and to CSS
> b) common practice
> c) allows text editors to auto-detect the encoding of a plain 
> text document.
> I believe therefore that the advice is incorrect and indeed 
> potentially damaging.
> -- 
>  Chris Lilley                    mailto:chris@w3.org
>  Interaction Domain Leader
>  Co-Chair, W3C SVG Working Group
>  W3C Graphics Activity Lead
>  Co-Chair, W3C Hypertext CG
Received on Wednesday, 6 December 2006 15:39:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:27 UTC