W3C home > Mailing lists > Public > www-international@w3.org > October to December 2006

RE: Strange advice re BOM and UTF-8

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 13 Dec 2006 19:01:15 +0900
Message-Id: <6.0.0.20.2.20061212100917.05f2b600@localhost>
To: "Richard Ishida" <ishida@w3.org>, "'Karl Dubost'" <karl@w3.org>, "'olivier Thereaux'" <ot@w3.org>
Cc: "'Chris Lilley'" <chris@w3.org>, <www-validator@w3.org>, <www-international@w3.org>

[Some of the comments are similar to Karl's, but they were
written a day or so ago, so I just left them in.]

At 23:51 06/12/11, Richard Ishida wrote:
>
>I have been meaning to update the UTF-8 tests for some time to
>[1] apply the latest template
>[2] rationalise and extend the tests
>
>Given this thread, I have done that this morning.  Note that 
>[1] I have separated out the tests for autodetection of utf-8 into a
>separate set.
>[2] there is now a series of 3 pages for investigating display issues
>related to bom handling - a third test has been added to test PHP includes
>(which seem to cause problems for IE and Opera).

I have problem finding this third page. Pointer, please.
Also, I have problems understanding why there should be
an issue with PHP includes. These happen on the server side, yes?

>I also produced some results files as per our current template at
>
>http://www.w3.org/International/tests/results/results-utf8-signature.php
>
>http://www.w3.org/International/tests/results/results-utf8-recognition.php

This test is one of the dangerous kind. It tests:

    The series of tests for which we are reporting results checks whether
    a user agent recognizes that a file declared as US-ASCII is really
    UTF-8 encoded, and displays the text as UTF-8.

It gives the impression that this is the right thing to do,
but there is no spec that I know that recommends that, and
the Character Model very clearly requires the contrary, see
http://www.w3.org/TR/charmod/#C028.

Also, there should be some very clear warnings that browser
settings and previous tests can strongly influence the results.
Frequently cleaning out all caching and similar information
and restarting the application helps, but some of this information
is extremely sticky.

Regards,   Martin.

>Karl, you might want to use these to structure the information at
>http://esw.w3.org/topic/QA/Utf8BomInteropReport
>
>
>RI
>
>
>============
>Richard Ishida
>Internationalization Lead
>W3C (World Wide Web Consortium)
>
>http://www.w3.org/People/Ishida/
>http://www.w3.org/International/
>http://people.w3.org/rishida/blog/
>http://www.flickr.com/photos/ishida/
> 


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Wednesday, 13 December 2006 10:02:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:09 GMT