- From: John Cowan <cowan@mercury.ccil.org>
- Date: Sun, 10 Nov 2013 02:01:48 -0500
- To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
- Cc: public-xml-core-wg@w3.org, Richard Tobin <richard@inf.ed.ac.uk>
Henry S. Thompson scripsit: > G) It is a fatal error if an XML entity is determined (via default, > encoding declaration, or higher-level protocol) to be in a > certain encoding but contains byte sequences that are not legal > in that encoding > > H) [I]t is a fatal error if an entity encoded in UTF-8 contains any > ill-formed code unit sequences Since the code units of UTF-8 entities are bytes, H is just a particular case of G. This was not always true, because some now-obsolete definitions of UTF-8 allowed ill-formed byte sequences to be present in existing documents, though they were not to be created in new ones. Now, however, they are errors plain and simple. > X) [I]t is a *fatal error* for an entity including an encoding > declaration to be presented to the XML processor in an encoding > other than that named in the declaration Since this only applies in the absence of EI, it seems to me to be a nullity. Under what circumstances could this error be detected? > W') [E]ntities without an encoding declaration which are delivered in > an encoding other than UTF-8 or UTF-16 *must* provide a charset > parameter > > This is a sensible constraint on transcoders, including non-XML-aware > transcoders -- if you transcode out of UTF-..., you *must* tell me to > what. +1 > X') [I]t is a *fatal error* for an entity including an encoding > declaration to be presented to the XML processor with a charset > parameter other than that named in the declaration > > The existing 3023bis draft includes something like this. I don't > think it can be retained, because it conflicts with W' for > non-XML-aware transcoders: doing the responsible thing would produce > documents which conflict with X'. In the alternative, we could just say that XML-unaware transcoding proxies are no longer state of the art; it's the responsibility of something which undertakes transcoding to make the necessary adjustments to the content. I suspect that few transcoding proxies actually exist in the wild anyway. > XX') It is a *fatal error* for an XML entity beginning with a BOM to > declare an encoding other than that implied by the BOM. > > That is, the BOM is authoritative regardless of whether EI is present > or not. If it's authoritative, then why worry about what the EI says? I see no reason to bother making this an error. > Y') [leads to a generalization of W'] Non-UTF-8-encoded entities > without an encoding declaration *must* be delivered with a > charset parameter and/or (in the case of UTF-16) a BOM Isn't this equivalent to W'? If so, then I think W' is clearer and should be used. > So, my tentative conclusion is that I will add something like Y' to > 3023bis, and also replace the existing language in 3023bis which > amounts to X' with something that notes that it's not an error per the > XML spec. if there's a conflict between the charset param and the > encoding decl, and that the charset param takes precedence in such a If we must, we must. -- John Cowan <cowan@ccil.org> http://www.ccil.org/~cowan "Make a case, man; you're full of naked assertions, just like Nietzsche." "Oh, i suffer from that, too. But you know, naked assertions or GTFO." --heard on #scheme, sorta
Received on Sunday, 10 November 2013 07:02:12 UTC