W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

Re: Should the UTF-8 BOM trump overriding via HTTP or by users?

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 8 Jun 2011 21:21:15 +0200
To: John Cowan <cowan@mercury.ccil.org>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international <www-international@w3.org>
Message-ID: <20110608212115185257.2ea6a519@xn--mlform-iua.no>
John Cowan, Wed, 8 Jun 2011 13:36:01 -0400:
> Leif Halvard Silli scripsit:
> 
>> … algorithm effectively plays the role of an external encoding info …

> Not quite. …

> It remains the responsibility of the parser to check the encoding
> returned by the sniffer against the encoding in the declaration, if any.
> If they don't match, boom.  So in that sense only, the sniffer plays
> the role of an external encoding.  But unlike HTTP headers, it cannot
> *override* the encoding declaration.

So, really, I don't know if Firefox uses your algorithm for the file:// 
protocol. All I know is that its *parser* fails to retun 'fatal error' 
when the BOM and the declaration differ. Based on the XML parsers I 
have used recently (Webkit, Gecko, Opera, 'oXygen XML editor', 'XMLmind 
XML editor'), it is the *exception* (only Webkit does it) rather than 
the rule, that file protocol parsing returns "fatal error" whenever 
encoding declaration differs from the BOM.

OTOH, XML 1.0 *allows* the encoding declaration to be ignored if HTTP 
declares an encoding. So one can perhaps understand the confusion: "In 
the absence of information provided by an external transport protocol 
(e.g. HTTP or MIME), it is a fatal error [ snip ]"

That the encoding declaration can be overridden by HTTP is thus quite 
indirectly expressed, in XML 1.0. But RFC3023 clarifies and explains - 
though it only does so for 'text/xml' - why it should be allowed to 
differ: 
  1) the possibility for "transcoding of MIME bodies", 
  2) a need to be compatible with text/plain, 
  3) that "web servers have been improved so that users can 
     specify the charset parameter"
  4) RFC2130 recommends it.

For application/xml, only the justification 3) and 4) are mentioned. 
And it seriously discusses deferring it to XML itself (pointing to 
appendix F) to handle encoding, despite that it also lists it as 
STRONGLY RECOMMENDED to use the charset parameter. Clearly RFC3023 
struggle a little to justify why it should be strongly recommended for 
application/xml. And it actually does not justify at all that the HTTP 
header should - or could - specify another encoding than the one in the 
(optional) XML encoding declaration.
-- 
Leif Halvard Silli
Received on Wednesday, 8 June 2011 19:21:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 8 June 2011 19:21:46 GMT