Leif Halvard Silli scripsit: > > In any case, Appendix F is non-normative. The algorithm [...], > > which has no authority except my own, allows an 8-BOM to override > > any XML declaration. It doesn't handle XML parsed entities. > > But is that in line with XML 1.0? The sniffer just attempts to discover the encoding: it doesn't check the document for correctness. If the document is not well-formed, it may return the wrong answer. In addition, some (hypothetical) encodings will not be correctly sniffed. For example, the imaginary us-bscii encoding, which is the same as us-ascii except that 0x61 is 'b' and 0x62 is 'a', will be sniffed as us-ascii. > XML describes normative "fatal error" situations related to encoding: > > 1. When external encoding info is absent: a) A processor fed with an > entity whose encoding differs from the info in the XML declaration. This is not actually testable: bad encoding will at best produce an error related to 4 below. > b) If BOM and XML encoding declaration is lacking too: feeding a > processor with an entity which isn't in UTF-8 encoded. Again, only testable if non-UTF8 bytes are found. > 2. To not have the XML declaration as the very first part of > the entity. (Example: An UTF-8 encoded doc with a BOM and a XML > declaration, but which for some reason is read as ISO-8859-1. Only > Opera allows the user to, this way, place the parser in 'fatal error' > mode.) > > 3. A parser presented with an encoding it is unable to handle That can only happen if the encoding declaration, HTTP header, or other high-level protocol contains something the parser can't identify. > 4. Discovering byte sequences that are illegal in the current encoding See above. > 5. Unless higher level protocol defines the encoding, and unless the > document is in UTF-8 or UTF-16 (so "UTF-16LE" is not covered!), then > it is an error to not have an encoding declaration. Correct. -- John Cowan cowan@ccil.org http://ccil.org/~cowan If I have seen farther than others, it is because I was standing on the shoulders of giants. --Isaac NewtonReceived on Wednesday, 8 June 2011 03:10:05 UTC
This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:59 UTC