- From: <bugzilla@jessica.w3.org>
- Date: Thu, 09 Jun 2011 11:27:01 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12897 --- Comment #10 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-06-09 11:27:00 UTC --- More data collected - after discussion on www-international@ and implementation tests: NOTE: Data is needed for IE9's XML parser. Assumption: behaves as Webkit (because that is how it acts for HTML) Spec data - XML: --- * XML 1.0 only says that Content-Type: *can* have priority (depending on what the higher protocol says) over "<?xml version="1.0" encoding="value"?>" Quote: ]] In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error[[ <http://www.w3.org/TR/xml/#charencoding> Thus it depends on the rules of the higher protocal. Spec data - RFC3023 --- 1) RFC3023 'XML Media Types' specifies that HTTP charset parameter does have priority. (Meaning that the xml parser must - legally - ignore the XML encoding declaration.) 2) But RFC3023 actually only justifies it for 'text/xml', where *transcoding* (leading the doc to have another coding than the one specified inside the document) and *compatibility with tex/plain* are the justifications: <http://tools.ietf.org/html/rfc3023#section-3.1> 3) For 'application/xml', then RFC3023 has no real justification. The only thing it has is: "it is possible for users to configure web servers" and "the HTTP spec says so". http://tools.ietf.org/html/rfc3023#section-3.2 4) Notably, RFC3023 seriously discusses the Appendix F: "Autodetection of Character Encodings (Non-Normative". (http://www.w3.org/TR/xml/#sec-guessing) Which (once again) under the heading "Priorities in the Presence of External Encoding Information" states: ]] In the interests of interoperability, however, the following rule is recommended. If an XML entity is in a file, the Byte-Order Mark and encoding declaration are used (if present) to determine the character encoding. [[ <http://www.w3.org/TR/xml/#sec-guessing-with-ext-info> Implementation data - RFC3023: --- * Parsers implementing RFC3023 (HTTP has priority over document data): Opera, Firefox, Amaya ** Parsers implementing RFC3023 and which *also* emits 'fatal errror' if HTTP charset and UTF-8 BOM disagree: Opera, Firefox. (Thus: not Amaya.) Note: per XML 1.0 it is required, *if HTTP and RFC3023 requires it! (and they do!)* to ignore the XML encoding declaration in favour of the HTTP charset paramenter. But note that it is not permitted, per XML 1.0, to act as if BOM does not exist, even if the doc is served via HTTP! * Parsers *not* implementing RFC3023 (thus giving priority to document data instead), and which do not emit fatal errors: Webkit, Xerces C++, XMLMind Editor on Mac (based on Xerces Java), RXP, oXygen ** Parsers *not* implementing RFC3023 and which, in case of conflict and without emitting fatal error, adheres to BOM and ignores the XML encoding declaration: Webkit, (IE9 must be checked) ** Parsers not implementing RFC3023 and which, in case of conflict and without emitting fatal error, adheres to the XML encoding declaration and ignores the BOM: XMLmind Editor for Mac, Xerces C++, oXygen, RXP Implementation data - non-RFC3023 (file protocol): --- * Parsers emitting fatal error if UTF-8 BOM conflicts with the XML encoding declaration: Opera. * Parsers *not* emitting fatal error if UTF-8 BOM conflicts with the XML encoding declaration: Webkit, Firefox, oXygen, XMLmind XML editor for mac (based on Xerces Java), Amaya ** Parsers *not* emitting fatal error if UTF-8 BOM conflicts with the XML encoding declaration and which gives priority to UTF-8 BOM: Webkit, Firefox, oXygen ** Parsers *not* emitting fatal error if UTF-8 BOM conflicts with the XML encoding declaration and which gives priority to XML encoding declaration (and/or to the UTF-8 encoding default, if they comopletely jumps over the UTF-8 BOM): XMLmind XML editor, RXP and (probably) Xerces C++ Implementation data - charset names: --- * Webkit and some of the editiors, emit 'fatal error' if the charset *name* in the XML encoding declaration is *unknown*. This, even if they (for example Webkit) *otherwise* do not emit a fatal error whenever UTF-8 BOM conflicts with the XML encoding declaration. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Thursday, 9 June 2011 11:27:03 UTC