W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

(unknown charset) Re: Should the UTF-8 BOM trump overriding via HTTP or by users?

From: (unknown charset) Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 9 Jun 2011 12:02:42 +0200
To: (unknown charset) www-international <www-international@w3.org>
Cc: (unknown charset) John Cowan <cowan@mercury.ccil.org>
Message-ID: <20110609120242976131.c1defea8@xn--mlform-iua.no>
Leif Halvard Silli, Thu, 9 Jun 2011 09:04:01 +0200:
> Leif Halvard Silli, Thu, 9 Jun 2011 01:38:11 +0200:
>> John Cowan, Wed, 8 Jun 2011 16:28:19 -0400:
>>> Leif Halvard Silli scripsit:
>>>> So, really, I don't know if Firefox uses your algorithm for the
>>>> file:// protocol. All I know is that its *parser* fails to retun
>>>> 'fatal error' when the BOM and the declaration differ. Based on the
>>>> XML parsers I have used recently (Webkit, Gecko, Opera, 'oXygen XML
>>>> editor', 'XMLmind XML editor'), it is the *exception* (only Webkit
>>>> does it)
>> Error: Webkit also does it. [ snip ]
>>>> rather than the rule, that file protocol parsing returns
>>>> "fatal error" whenever encoding declaration differs from the BOM.
>>> That's clearly a bug, then.  If the encoding declaration is *not* UTF-8,
>>> then the BOM is not a BOM at all, but characters preceding the XML
>>> declaration.  That means the input is not well formed.
>> Even the RXP parser [1], [ snip ] have that bug. 
> And Xerces. 

And Amaya: 
 * For files, it ignores the UTF-8 BOM. And adheres to ISO-8859-1 
inside the XML encoding declaration. Without "fatal error"
 * For HTTP, it ignores UTF-8 BOM and XML encoding declaration inside 
the document, and adheres to ISO-8859-1 coming from HTTP, without 
"fatal error".
Leif Halvard Silli
Received on Thursday, 9 June 2011 10:03:13 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:04:30 UTC