- From: John Cowan <cowan@mercury.ccil.org>
- Date: Tue, 7 Jun 2011 23:09:42 -0400
- To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international <www-international@w3.org>
Leif Halvard Silli scripsit: > > In any case, Appendix F is non-normative. The algorithm [...], > > which has no authority except my own, allows an 8-BOM to override > > any XML declaration. It doesn't handle XML parsed entities. > > But is that in line with XML 1.0? The sniffer just attempts to discover the encoding: it doesn't check the document for correctness. If the document is not well-formed, it may return the wrong answer. In addition, some (hypothetical) encodings will not be correctly sniffed. For example, the imaginary us-bscii encoding, which is the same as us-ascii except that 0x61 is 'b' and 0x62 is 'a', will be sniffed as us-ascii. > XML describes normative "fatal error" situations related to encoding: > > 1. When external encoding info is absent: a) A processor fed with an > entity whose encoding differs from the info in the XML declaration. This is not actually testable: bad encoding will at best produce an error related to 4 below. > b) If BOM and XML encoding declaration is lacking too: feeding a > processor with an entity which isn't in UTF-8 encoded. Again, only testable if non-UTF8 bytes are found. > 2. To not have the XML declaration as the very first part of > the entity. (Example: An UTF-8 encoded doc with a BOM and a XML > declaration, but which for some reason is read as ISO-8859-1. Only > Opera allows the user to, this way, place the parser in 'fatal error' > mode.) > > 3. A parser presented with an encoding it is unable to handle That can only happen if the encoding declaration, HTTP header, or other high-level protocol contains something the parser can't identify. > 4. Discovering byte sequences that are illegal in the current encoding See above. > 5. Unless higher level protocol defines the encoding, and unless the > document is in UTF-8 or UTF-16 (so "UTF-16LE" is not covered!), then > it is an error to not have an encoding declaration. Correct. -- John Cowan cowan@ccil.org http://ccil.org/~cowan If I have seen farther than others, it is because I was standing on the shoulders of giants. --Isaac Newton
Received on Wednesday, 8 June 2011 03:10:05 UTC