- From: Dan Connolly <connolly@w3.org>
- Date: Sat, 08 Apr 2000 08:38:34 -0500
- To: "Martin J. Duerst" <duerst@w3.org>
- CC: Ed Simon <ed.simon@entrust.com>, "'w3c-xml-core-wg@w3.org'" <w3c-xml-core-wg@w3.org>, "'w3c-ietf-xmldsig@w3.org'" <w3c-ietf-xmldsig@w3.org>
"Martin J. Duerst" wrote: > > At 00/04/07 18:09 -0500, Dan Connolly wrote: > > >Perhaps. But perhaps the shortest path to the target is to cut > >out the namespace stuff and character model stuff out of the > >c14n algorithm. Rewriting namespace prefixes causes > >all sorts of headaches: > > > > "I hate to say that I told you so, but... -Tim" > > -- Tim Bray > > Re: c14n messes up qnames in attribute values > > Yes, but the real problem here is the spread of qnames > all over the place, not the c14n algorithm. Using qnames > instead of URIs replaces a universal identifier that can > be treated independently anywhere by something that is > very fragile because it depends on an indirection, on > additional information, and on very complex rules for > how to find the actual URI. Qnames are dangerous, and > the longer we go, the more we will find out. Huh? The use of name/value bindings in a language is fragile and dangerous? These are bindings whithin one syntactic document. I think you're just spreading fear, uncertainty and doubt. There's nothing especially complex or fragile about the algorithm for finding URIs from qnames: just find the nearest-enclosing element with a matching declaration. > So it's not rewriting namespaces that causes problems, > it's the unrestricted use of qnames by people who don't > understand the consequences that is the problem. I can point to specific problems caused by rewriting namespace prefixes: consider the following parts of a stylesheet I'm working on: <stylesheet xmlns ="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:h ="http://www.w3.org/1999/xhtml"> <template match="h:table//h:tr/h:td[string-length(normalize-space(.))>1]"> ... </stylesheet> The meaning of the qnames in the match attribute is clear in the global context: h:tr is the table row element of XHTML, template is per the W3C XSLT specification, and so on. But if I canonicalize that document per the current c14n spec, the qnames in the match attribute value will lose their bindings. We might consider revising the c14n algorithm to say "rewrite qnames in attribute values too" but how does one find qnames in attribute values in general? e.g. <aDoc blort="foo:blort"/> Did the author of that document intend foo:blort as a qname, a URI using an as-yet-unregistered scheme, or just a string? There's no way to know, in general. On the other hand, it is clear how to tell producers of XML documents to declare their namespaces in a normalized way; if somebody told me, the guy who knows the semantics of my stylesheet document, that it has to be in normal form, I can rewrite the XPath expressions and such: <n1:stylesheet xmlns:n1 ="http://www.w3.org/1999/XSL/Transform" version="1.0"> <n1:template xmlns:n1 ="http://www.w3.org/1999/XSL/Transform" xmlns:n2 ="http://www.w3.org/1999/xhtml" match="n2:table//n2:tr/n2:td[string-length(normalize-space(.))>1]"> ... </n1:stylesheet> So specifying a way to *test* whether an XML document's namespace declarations are in normal form is straightforward, but specifying a transformation from general form to normal form has specific, identifyable, known problems: it silently changes the semantics of documents that conform to W3C Recommendations. OK, so I've given detailed explanation of the problems of combining a (W3C Recommended) use of qnames with the (draft) c14n spec. I don't think the use of qnames by themselves have this sort of problem. But if they do, please provide details. > > From: Tim Bray (tbray@textuality.com) > > Date: Mon, Mar 20 2000 > >http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2000M > >ar/0004.html > > > >And I maintain that character normalization is orthogonal to > >element-and-attribute c14n. > > As I have explained in a mail to the XML core WG, that's not > exactly the case. But based on new insights, the I18N WG/IG > has already made clear that in particular for digital signatures, > xml canonicalization and character normalization should be > considered separately. I don't understand... you say it's "not exactly the case" that "character normalization is orthogonal to element-and-attribute c14n" and yet "xml canonicalization and character normalization should be considered separately"; that's a direct contradiction, no? Please elaborate with details. > >It was suggested to me (by Noah Mendelsohn) that we could take > >namespace prefix munging out of the c14n algorithm, but document > >a "namespace normalized form" as an appendix or something; this > >appendix wouldn't specify an algorithm with inputs and ouputs, > >but rather just a test/constraint on documents ala > > > > A document is in namespace-normal form iff... > > Which way to specify (procedural or as conditions on the result) > is rather independent of what to specify. The current canonicalization > algorithm is already rather non-procedural. As I detailed above, it's quite different to say your stylesheet is not in namespace-normal form than to say here's the canonical version of your stylesheet; beware that we have silently changed it from a conforming XSLT stylesheet to somethig with broken XPaths. Data integrity is job 1. > >And the same goes for character normalization. > > Yes, having a name for the thing, and explaining why and > where it may be important, is a good idea. > > >Perhaps DSig would require its input to be in character-normal > >form to avoid the case of a user being unable to see > >birthday-attack changes between o-umlaut-precomposed and > >o-umlaut-decomposed. > > I don't understand what you mean by 'birthday-attack', It relates to a sort of parlor game: A: I bet there are two people in the room with the same birthday. B: no way! what are the odds of that?!?!?! You're on! A: Attention everyone! How many birthdays in January? What dates? I see... and February? Dates? All different... March? Dates? What was that? Fred and Joan were both born 13 March? Thank you. A: in fact, if there are more than 30 people in the room, the odds are in my favor. In cryptography, it relates to substituting a forgery that has the same hash value as the original. In theory, an attacker might find a forgery that is different only in (a) the the precomposed vs. decomposed characters and (b) the dollar amount on the check. > but > this is essentially what the I18N WG/IG is asking > XMLDSIG to do. You're asking now that XML DSig require input to digital signatures to be in character-normal form, or you have asked previously? I understood your previous communications to request that the DSig WG require the signing algorithm to do character normalization of its input, not to exclude unnormalized documents from its input. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ pager (put return tel# in From or Subject field) mailto:connolly.pager@w3.org
Received on Saturday, 8 April 2000 09:45:59 UTC