W3C home > Mailing lists > Public > www-validator@w3.org > December 2006

Re: Strange advice re BOM and UTF-8

From: Karl Dubost <karl@w3.org>
Date: Mon, 11 Dec 2006 11:59:35 +0900
Message-Id: <8920143B-86A2-401C-BF82-916F7E3643EF@w3.org>
Cc: Chris Lilley <chris@w3.org>, www-validator@w3.org, www-international@w3.org
To: olivier Thereaux <ot@w3.org>


Le 8 déc. 2006 à 21:01, olivier Thereaux a écrit :
> On Dec 7, 2006, at 21:56 , Karl Dubost wrote:
>>
>> Time for interoperability testing and implementation report
>> http://esw.w3.org/topic/QA/Utf8BomInteropReport
>>
>> Feel free to modify the wording of the page or to provide a better  
>> way to test.
>
> I think the test reporting should probably be changed from good/bad  
> (defined by passing the first test I assume?) to noting which of  
> the basic/extended test passes or fails.

Good comments

There are four files in the page
Classic cases:
http://www.w3.org/International/tests/test-utf8-signature/withoutbom- 
withcharset.html
http://www.w3.org/International/tests/test-utf8-signature/withoutbom- 
nocharset.html


These for really the BOM testing
http://www.w3.org/International/tests/test-utf8-signature/withbom- 
withcharset.html
http://www.w3.org/International/tests/test-utf8-signature/withbom- 
nocharset.html


I think to make it easier, it is better to limit the test to

with BOM, with Charset
http://www.w3.org/International/tests/test-utf8-signature/withbom- 
withcharset.html


>  My browser for instance passes the basi test but not the extended,  
> which seems to mean that the BOM is not harmful to it, but it's not  
> used either.

Which means that the browser is working well. no. Because HTTP has  
precedence.
The pages served as US-ASCII MUST fail.


> If we are to draw conclusions from this testing, we might as well  
> see whether the BOM breaks implementations AND whether it is used  
> at all.

Shall we give the expected results ?

This is the normal result in an HTTP environment.
   Passed  - without BOM  with    charset (served as utf-8)
   Failed  - without BOM  without charset (served as us-ascii)
   Passed  - with    BOM  with    charset (served as utf-8)
   Failed  - with    BOM  without charset (served as us-ascii)

For authoring tool (which are not HTTP user agents), only the test  
with BOM with charset is meaningful.

I have added this table to the page, feel free to modify
http://esw.w3.org/topic/QA/Utf8BomInteropReport


-- 
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
   QA Weblog - http://www.w3.org/QA/
      *** Be Strict To Be Cool ***
Received on Monday, 11 December 2006 03:00:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:23 GMT