Re: Should the UTF-8 BOM trump overriding via HTTP or by users? from Leif Halvard Silli on 2011-06-08 (www-international@w3.org from April to June 2011)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 9 Jun 2011 01:38:11 +0200
To: John Cowan <cowan@mercury.ccil.org>
Cc: www-international <www-international@w3.org>
Message-ID: <20110609013811872108.6d32d2f8@xn--mlform-iua.no>

John Cowan, Wed, 8 Jun 2011 16:28:19 -0400:
> Leif Halvard Silli scripsit:
> 
>> So, really, I don't know if Firefox uses your algorithm for the
>> file:// protocol. All I know is that its *parser* fails to retun
>> 'fatal error' when the BOM and the declaration differ. Based on the
>> XML parsers I have used recently (Webkit, Gecko, Opera, 'oXygen XML
>> editor', 'XMLmind XML editor'), it is the *exception* (only Webkit
>> does it)

Error: Webkit also does it. (Instead Webkit is unique [sic] in showing 
fatal error when there there is BOM in combination with an unknown 
charset name in the encoding declaration.)

>> rather than the rule, that file protocol parsing returns
>> "fatal error" whenever encoding declaration differs from the BOM.
> 
> That's clearly a bug, then.  If the encoding declaration is *not* UTF-8,
> then the BOM is not a BOM at all, but characters preceding the XML
> declaration.  That means the input is not well formed.

Even the RXP parser [1], which seems to be used in production of the 
XML test suite, [2] have that bug. Which indicates that the test suite 
does not have basic encoding tests. Well, searching for your name on 
the list of test contributors, I at least found *one* test of an 
invalid encoding declaration. [3] :-)  Namely file 'E61.xml'. 

[1] http://www.cogsci.ed.ac.uk/~richard/rxp.html
[2] 
http://lists.w3.org/Archives/Public/public-xml-testsuite/2008Sep/0000
[3] http://www.w3.org/XML/Test/xmlconf-20080827
-- 
Leif Halvard Silli

Received on Wednesday, 8 June 2011 23:38:40 UTC