Re: Should the UTF-8 BOM trump overriding via HTTP or by users? from Leif Halvard Silli on 2011-06-09 (www-international@w3.org from April to June 2011)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 9 Jun 2011 12:02:42 +0200
To: www-international <www-international@w3.org>
Cc: John Cowan <cowan@mercury.ccil.org>
Message-ID: <20110609120242976131.c1defea8@xn--mlform-iua.no>

Leif Halvard Silli, Thu, 9 Jun 2011 09:04:01 +0200:
> Leif Halvard Silli, Thu, 9 Jun 2011 01:38:11 +0200:
>> John Cowan, Wed, 8 Jun 2011 16:28:19 -0400:
>>> Leif Halvard Silli scripsit:
>>> 
>>>> So, really, I don't know if Firefox uses your algorithm for the
>>>> file:// protocol. All I know is that its *parser* fails to retun
>>>> 'fatal error' when the BOM and the declaration differ. Based on the
>>>> XML parsers I have used recently (Webkit, Gecko, Opera, 'oXygen XML
>>>> editor', 'XMLmind XML editor'), it is the *exception* (only Webkit
>>>> does it)
>> 
>> Error: Webkit also does it. [ snip ]
> 
>>>> rather than the rule, that file protocol parsing returns
>>>> "fatal error" whenever encoding declaration differs from the BOM.
>>> 
>>> That's clearly a bug, then.  If the encoding declaration is *not* UTF-8,
>>> then the BOM is not a BOM at all, but characters preceding the XML
>>> declaration.  That means the input is not well formed.
>> 
>> Even the RXP parser [1], [ snip ] have that bug. 
> 
> And Xerces. 

And Amaya: 
 * For files, it ignores the UTF-8 BOM. And adheres to ISO-8859-1 
inside the XML encoding declaration. Without "fatal error"
 * For HTTP, it ignores UTF-8 BOM and XML encoding declaration inside 
the document, and adheres to ISO-8859-1 coming from HTTP, without 
"fatal error".
-- 
Leif Halvard Silli

Received on Thursday, 9 June 2011 10:03:13 UTC