- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 8 Jun 2011 21:21:15 +0200
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international <www-international@w3.org>
John Cowan, Wed, 8 Jun 2011 13:36:01 -0400: > Leif Halvard Silli scripsit: > >> … algorithm effectively plays the role of an external encoding info … > Not quite. … > It remains the responsibility of the parser to check the encoding > returned by the sniffer against the encoding in the declaration, if any. > If they don't match, boom. So in that sense only, the sniffer plays > the role of an external encoding. But unlike HTTP headers, it cannot > *override* the encoding declaration. So, really, I don't know if Firefox uses your algorithm for the file:// protocol. All I know is that its *parser* fails to retun 'fatal error' when the BOM and the declaration differ. Based on the XML parsers I have used recently (Webkit, Gecko, Opera, 'oXygen XML editor', 'XMLmind XML editor'), it is the *exception* (only Webkit does it) rather than the rule, that file protocol parsing returns "fatal error" whenever encoding declaration differs from the BOM. OTOH, XML 1.0 *allows* the encoding declaration to be ignored if HTTP declares an encoding. So one can perhaps understand the confusion: "In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error [ snip ]" That the encoding declaration can be overridden by HTTP is thus quite indirectly expressed, in XML 1.0. But RFC3023 clarifies and explains - though it only does so for 'text/xml' - why it should be allowed to differ: 1) the possibility for "transcoding of MIME bodies", 2) a need to be compatible with text/plain, 3) that "web servers have been improved so that users can specify the charset parameter" 4) RFC2130 recommends it. For application/xml, only the justification 3) and 4) are mentioned. And it seriously discusses deferring it to XML itself (pointing to appendix F) to handle encoding, despite that it also lists it as STRONGLY RECOMMENDED to use the charset parameter. Clearly RFC3023 struggle a little to justify why it should be strongly recommended for application/xml. And it actually does not justify at all that the HTTP header should - or could - specify another encoding than the one in the (optional) XML encoding declaration. -- Leif Halvard Silli
Received on Wednesday, 8 June 2011 19:21:45 UTC