- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 8 Jun 2011 04:47:52 +0200
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international <www-international@w3.org>
John Cowan, Tue, 7 Jun 2011 13:41:56 -0400:
> Leif Halvard Silli scripsit:
>> ]]
>> In the interests of interoperability, however, the following rule is
>> recommended.
>> * If an XML entity is in a file, the Byte-Order Mark and encoding
>> declaration are used (if present) to determine the character encoding.
>> [[
> Did you paste the wrong quotation? That explicitly refers to XML entities
> in files; i.e. without HTTP metadata.
The quote appears under the heading "F.2 Priorities in the Presence of
External Encoding Information". Perhaps section '2.11 End-of-Line
Handling' gives a hint, it says: "XML parsed entities are often stored
in computer files […]". Because, when a parsed file is stored, it has
to include encoding info, which this section suggest to reuse.
> In any case, Appendix F is non-normative. The algorithm described in
>
http://recycledknowledge.blogspot.com/2005/07/hello-i-am-xml-encoding-sniffer.html
> ,
> which has no authority except my own, allows an 8-BOM to override any
> XML declaration. It doesn't handle XML parsed entities.
But is that in line with XML 1.0? XML describes normative "fatal error"
situations related to encoding:
1. When external encoding info is absent:
a) A processor fed with an entity whose encoding differs from
the info in the XML declaration.
b) If BOM and XML encoding declaration is lacking too: feeding
a processor with an entity which isn't in UTF-8 encoded.,
2. To not have the XML declaration as the very first part of the
entity. (Example: An UTF-8 encoded doc with a BOM and a XML
declaration, but which for some reason is read as ISO-8859-1. Only
Opera allows the user to, this way, place the parser in 'fatal error'
mode.)
3. A parser presented with an encoding it is unable to handle
4. Discovering byte sequences that are illegal in the current encoding
5. Unless higher level protocol defines the encoding, and unless the
document is in UTF-8 or UTF-16 (so "UTF-16LE" is not covered!), then it
is an error to not have an encoding declaration.
PS: For XML, then it turns out that Firefox is a unwilling to lett he
user override the UTF-8 encoding as Webkit. It just takes anothe rangle
on it: If the XML page is served via HTTP, with an incorrect encoding
label in the Content-Type:, the it leads to yellow screen of death.
*And it is impossible for the user to fix it by manually selecting e.g.
UTF-8.*
If same file is consumed via the file protocol, then Firefox will
ignore the XML declaration, if there is one. And if there is no XML
encoding declaration, then it will default to UTF-8. As it will when
there is a BOM. However, it will not allow the user to change the
encoding!
Leif Halvard Silli
Received on Wednesday, 8 June 2011 02:48:24 UTC