- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 8 Jun 2011 04:47:52 +0200
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international <www-international@w3.org>
John Cowan, Tue, 7 Jun 2011 13:41:56 -0400: > Leif Halvard Silli scripsit: >> ]] >> In the interests of interoperability, however, the following rule is >> recommended. >> * If an XML entity is in a file, the Byte-Order Mark and encoding >> declaration are used (if present) to determine the character encoding. >> [[ > Did you paste the wrong quotation? That explicitly refers to XML entities > in files; i.e. without HTTP metadata. The quote appears under the heading "F.2 Priorities in the Presence of External Encoding Information". Perhaps section '2.11 End-of-Line Handling' gives a hint, it says: "XML parsed entities are often stored in computer files […]". Because, when a parsed file is stored, it has to include encoding info, which this section suggest to reuse. > In any case, Appendix F is non-normative. The algorithm described in > http://recycledknowledge.blogspot.com/2005/07/hello-i-am-xml-encoding-sniffer.html > , > which has no authority except my own, allows an 8-BOM to override any > XML declaration. It doesn't handle XML parsed entities. But is that in line with XML 1.0? XML describes normative "fatal error" situations related to encoding: 1. When external encoding info is absent: a) A processor fed with an entity whose encoding differs from the info in the XML declaration. b) If BOM and XML encoding declaration is lacking too: feeding a processor with an entity which isn't in UTF-8 encoded., 2. To not have the XML declaration as the very first part of the entity. (Example: An UTF-8 encoded doc with a BOM and a XML declaration, but which for some reason is read as ISO-8859-1. Only Opera allows the user to, this way, place the parser in 'fatal error' mode.) 3. A parser presented with an encoding it is unable to handle 4. Discovering byte sequences that are illegal in the current encoding 5. Unless higher level protocol defines the encoding, and unless the document is in UTF-8 or UTF-16 (so "UTF-16LE" is not covered!), then it is an error to not have an encoding declaration. PS: For XML, then it turns out that Firefox is a unwilling to lett he user override the UTF-8 encoding as Webkit. It just takes anothe rangle on it: If the XML page is served via HTTP, with an incorrect encoding label in the Content-Type:, the it leads to yellow screen of death. *And it is impossible for the user to fix it by manually selecting e.g. UTF-8.* If same file is consumed via the file protocol, then Firefox will ignore the XML declaration, if there is one. And if there is no XML encoding declaration, then it will default to UTF-8. As it will when there is a BOM. However, it will not allow the user to change the encoding! Leif Halvard Silli
Received on Wednesday, 8 June 2011 02:48:24 UTC