- From: John Cowan <cowan@mercury.ccil.org>
- Date: Sun, 10 Nov 2013 02:01:48 -0500
- To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
- Cc: public-xml-core-wg@w3.org, Richard Tobin <richard@inf.ed.ac.uk>
Henry S. Thompson scripsit:
> G) It is a fatal error if an XML entity is determined (via default,
> encoding declaration, or higher-level protocol) to be in a
> certain encoding but contains byte sequences that are not legal
> in that encoding
>
> H) [I]t is a fatal error if an entity encoded in UTF-8 contains any
> ill-formed code unit sequences
Since the code units of UTF-8 entities are bytes, H is just a particular
case of G. This was not always true, because some now-obsolete
definitions of UTF-8 allowed ill-formed byte sequences to be present
in existing documents, though they were not to be created in new ones.
Now, however, they are errors plain and simple.
> X) [I]t is a *fatal error* for an entity including an encoding
> declaration to be presented to the XML processor in an encoding
> other than that named in the declaration
Since this only applies in the absence of EI, it seems to me to be
a nullity. Under what circumstances could this error be detected?
> W') [E]ntities without an encoding declaration which are delivered in
> an encoding other than UTF-8 or UTF-16 *must* provide a charset
> parameter
>
> This is a sensible constraint on transcoders, including non-XML-aware
> transcoders -- if you transcode out of UTF-..., you *must* tell me to
> what.
+1
> X') [I]t is a *fatal error* for an entity including an encoding
> declaration to be presented to the XML processor with a charset
> parameter other than that named in the declaration
>
> The existing 3023bis draft includes something like this. I don't
> think it can be retained, because it conflicts with W' for
> non-XML-aware transcoders: doing the responsible thing would produce
> documents which conflict with X'.
In the alternative, we could just say that XML-unaware transcoding proxies
are no longer state of the art; it's the responsibility of something which
undertakes transcoding to make the necessary adjustments to the content.
I suspect that few transcoding proxies actually exist in the wild anyway.
> XX') It is a *fatal error* for an XML entity beginning with a BOM to
> declare an encoding other than that implied by the BOM.
>
> That is, the BOM is authoritative regardless of whether EI is present
> or not.
If it's authoritative, then why worry about what the EI says? I see
no reason to bother making this an error.
> Y') [leads to a generalization of W'] Non-UTF-8-encoded entities
> without an encoding declaration *must* be delivered with a
> charset parameter and/or (in the case of UTF-16) a BOM
Isn't this equivalent to W'? If so, then I think W' is clearer and
should be used.
> So, my tentative conclusion is that I will add something like Y' to
> 3023bis, and also replace the existing language in 3023bis which
> amounts to X' with something that notes that it's not an error per the
> XML spec. if there's a conflict between the charset param and the
> encoding decl, and that the charset param takes precedence in such a
If we must, we must.
--
John Cowan <cowan@ccil.org> http://www.ccil.org/~cowan
"Make a case, man; you're full of naked assertions, just like Nietzsche."
"Oh, i suffer from that, too. But you know, naked assertions or GTFO."
--heard on #scheme, sorta
Received on Sunday, 10 November 2013 07:02:12 UTC