- From: Joseph Reagle <reagle@w3.org>
- Date: Tue, 25 Feb 2003 18:30:05 -0500
- To: "John Boyer" <JBoyer@PureEdge.com>, <w3c-ietf-xmldsig@w3.org>
- Cc: <FYergeau@alis.com>, <xml-editor@w3.org>, Martin Dürst <duerst@w3.org>
On Friday 21 February 2003 14:29, John Boyer wrote: > So 'technically' C14N is OK because you seemingly can't create an XPath > data model for the offending class of XML documents (those containing > character references such as ). But I don't like 'technically' > correct because I'm sure few people realize that there seemingly a class > of XML documents for which there is no canonicalization because there is > no XPath data model. Would you agree? > 1) XML documents containing these character references are not supported, After thinking about it further and speaking with Martin, I think this is the case. There is no canonicalization for an XML instance with a character such as , because there is no XPath node set for it, because no XPath processor would ever parse such an instance, because it's not well formed XML. The expression "[^<&]* - ([^<&]* ']]>' [^<&]*)" is rather baroque on its face, and as John notes it's very weird that we have to read an augmented BNF grammar which itself references back to a production to understand the CharData production. So, granted, it is ugly and confusing, but is it causing sufficient problems that it merits an erratum? Given that Xerces was balking on those characters, it seems like it at least got it right. I suspect that XML 1.0 is so old now <smile/> that the XML authors feel most implementations have already stubbed their toe and it's best to look to the future... Martin pointed out to me that XML 1.1 is supposed to ameliorate these problems with: http://www.w3.org/TR/xml11/#sec4.1 Change the Well-formedness constraint: Legal Character to read: Characters referred to using character references must either match the production for Char, or be one of the ISO control characters in the ranges [#x1-#x1F] or [#x7F-#x9F].
Received on Tuesday, 25 February 2003 18:30:44 UTC