- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 8 Jun 2011 18:50:48 +0200
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international <www-international@w3.org>
John Cowan, Wed, 8 Jun 2011 11:54:30 -0400: > Leif Halvard Silli scripsit: > >> So, you mean: it is a step before the doc is fed to the parser? > > Yes. So, that algorithm effectively plays the role of an external encoding information. Becuase, unless, the XML parser is not permitted to interpret the document different from the XML encoding declaration. >> I don't see how this is different from what Safari and IE do when they >> override the HTTP header whenever they see the UTF-8 BOM. > > My algorithm is a "file" algorithm; it doesn't know anything about HTTP > headers. Yes, but I think about the principle. This thread is meant to focus on what parsers are allowed to do. Like one can inspect the BOM and the XML declaration before feeding to parser, on could also inspect BOM, declration *and* HTTP. Like your algorithm effectively is external encoding information, an algorithm that also takes into account HTTP before doing the overriding, would just be some form of external encoding information. >> (Firefox behave as your algorithm, for the file:// protocol.) > > Good. :-) I still question its validity, according to XML. It blurs out the draconian error handling of XML. >>>> XML describes normative "fatal error" situations related to encoding: >>>> >>>> 1. When external encoding info is absent: a) A processor fed with an >>>> entity whose encoding differs from the info in the XML declaration. >>> >>> This is not actually testable: bad encoding will at best produce an >>> error related to 4 below. >> >> There is no necessarily need for test (that is: step 4 is not always >> needed). It can be a matter comparing the encoding labels. Because: >> First the parser determines the encoding. And if it uses the BOM to >> determine the encoding, and thereafter discovers that the XML encoding >> declaration says "KOI8-R" , then we have the "fatal error" situation. > > True. Even XML editors, like well known oXygen do not display a fatal error for this, though. But perhasp it includes a non-parser which checks the code first, using your algorithm, and "override"? >> But does any XML parser obey this rule? At least Webkit, Opera, Firefox >> do not. They instead accept the BOM and ignore the XML encoding >> declaratation. (Exception: if the encoding in the declaration is an >> unknown encoding, then Webkit shows fatal error - but this is actually >> 3 - see below.) > > Those are XML parsers inside browsers, which I know little about. > >> I take your silence as agreement. The "some reason" could be a HTTP >> header. > > Quite so. -- Leif H Silli
Received on Wednesday, 8 June 2011 16:51:20 UTC