RE: Strange advice re BOM and UTF-8 from Martin Duerst on 2006-12-15 (www-validator@w3.org from December 2006)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Fri, 15 Dec 2006 17:12:47 +0900
To: "Richard Ishida" <ishida@w3.org>, "'Karl Dubost'" <karl@w3.org>, "'olivier Thereaux'" <ot@w3.org>
Cc: "'Chris Lilley'" <chris@w3.org>, <www-validator@w3.org>, <www-international@w3.org>
Message-Id: <6.0.0.20.2.20061215170016.0b4999a0@localhost>

At 19:46 06/12/14, Richard Ishida wrote:
>> From: Martin Duerst [mailto:duerst@it.aoyama.ac.jp] 
>> Sent: 13 December 2006 10:01
>> >[2] there is now a series of 3 pages for investigating 
>> display issues 
>> >related to bom handling - a third test has been added to test PHP 
>> >includes (which seem to cause problems for IE and Opera).
>> 
>> I have problem finding this third page. Pointer, please.
>
>http://www.w3.org/International/tests/sec-utf8-signature-3

The description here makes it a bit more easy to understand
what this test is about, but I still have my doubts.
I think what it essentially tests is what kind of effects on
display a BOM has in the middle of an HTML file. That the
BOM is included via PHP seems to me absolutely irrelevant for
the test. Also, that the included PHP file produces <div>s
with PHP echo statements seems rather irrelevant. The problem
is simply that PHP, like any other straightforward technology,
doesn't care to remove the BOM from an included file.

This is an excellent example of what I mentioned previously,
namely that using a BOM breaks all kinds of processing (or makes
such processing a lot more difficult) once you stop dealing
with every file as a single, totally independent unit.
Advice such as "don't put a BOM on your include files" is
very justified.

>I have also listed the relevant pages more clearly in the results pages.
>
>
>> This test is one of the dangerous kind. It tests:
>> 
>>     The series of tests for which we are reporting results 
>> checks whether
>>     a user agent recognizes that a file declared as US-ASCII is really
>>     UTF-8 encoded, and displays the text as UTF-8.
>> 
>> It gives the impression that this is the right thing to do, 
>> but there is no spec that I know that recommends that, and 
>> the Character Model very clearly requires the contrary, see 
>> http://www.w3.org/TR/charmod/#C028.
>
>I added some text to the summary to say this.

I guess you mean the second paragraph on
http://www.w3.org/International/tests/:
"Note that these tests do not only test conformance with W3C standards.
In some cases the tests also allow for exploration of the behavior of
user agents in ways not described by the standards."

This is a good start, but later in this page, it says
"These tests have been developed bearing in mind the need for
content developers to learn about features, and know whether
and how those features are supported on a particular user agent."

and also:

"Care has also been taken to enable fairly easy adaptation of
the tests by QA Engineers as part of another test suite."

and:

"The tests and results have also proved useful to user agent
developers for planning improvements to their products."

which all point in a different direction. Also, the "in some cases"
isn't helpful; what's really needed is that EACH test says whether
it tests a feature of a spec (and if yes, which one), whether it's
trying to figure out what future specs should say on a point e.g.
not covered currently, or whether it e.g. tests something that
browsers actually SHOULD NOT do.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Friday, 15 December 2006 08:14:07 UTC