- From: John Boyer <JBoyer@PureEdge.com>
- Date: Tue, 25 Feb 2003 15:57:34 -0800
- To: "Joseph Reagle" <reagle@w3.org>, <w3c-ietf-xmldsig@w3.org>
- Cc: <FYergeau@alis.com>, <xml-editor@w3.org>, Martin Dürst <duerst@w3.org>
Hi Joseph, Our implementation of C14N still runs into a bit of trouble because, although you can't get the Xerces parser to read the bytes, there is apparently nothing wrong with making a DOM call that sets the illegal bytes into the content. I already have to do other things to enforce the Xpath data model on the DOM tree, so this is something I can take up in implementation. I would also agree that an erratum is probably not needed given the change in XML 1.1 cited by Martin as well as his point that character references to the illegal characters also violate well-formedness. This last point also clears up my concerns about Xpath and C14N. Thanks for the clarifications! John Boyer -----Original Message----- From: Joseph Reagle [mailto:reagle@w3.org] Sent: Tuesday, February 25, 2003 3:30 PM To: John Boyer; w3c-ietf-xmldsig@w3.org Cc: FYergeau@alis.com; xml-editor@w3.org; Martin Dürst Subject: Re: Possible XML and C14N errata On Friday 21 February 2003 14:29, John Boyer wrote: > So 'technically' C14N is OK because you seemingly can't create an XPath > data model for the offending class of XML documents (those containing > character references such as ). But I don't like 'technically' > correct because I'm sure few people realize that there seemingly a class > of XML documents for which there is no canonicalization because there is > no XPath data model. Would you agree? > 1) XML documents containing these character references are not supported, After thinking about it further and speaking with Martin, I think this is the case. There is no canonicalization for an XML instance with a character such as , because there is no XPath node set for it, because no XPath processor would ever parse such an instance, because it's not well formed XML. The expression "[^<&]* - ([^<&]* ']]>' [^<&]*)" is rather baroque on its face, and as John notes it's very weird that we have to read an augmented BNF grammar which itself references back to a production to understand the CharData production. So, granted, it is ugly and confusing, but is it causing sufficient problems that it merits an erratum? Given that Xerces was balking on those characters, it seems like it at least got it right. I suspect that XML 1.0 is so old now <smile/> that the XML authors feel most implementations have already stubbed their toe and it's best to look to the future... Martin pointed out to me that XML 1.1 is supposed to ameliorate these problems with: http://www.w3.org/TR/xml11/#sec4.1 Change the Well-formedness constraint: Legal Character to read: Characters referred to using character references must either match the production for Char, or be one of the ISO control characters in the ranges [#x1-#x1F] or [#x7F-#x9F].
Received on Tuesday, 25 February 2003 18:58:39 UTC