Re: XML Signature use of Canonical XML from Martin J. Duerst on 2000-04-10 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Mon, 10 Apr 2000 14:54:57 +0900
To: Dan Connolly <connolly@w3.org>
Cc: Ed Simon <ed.simon@entrust.com>, "'w3c-xml-core-wg@w3.org'" <w3c-xml-core-wg@w3.org>, "'w3c-ietf-xmldsig@w3.org'" <w3c-ietf-xmldsig@w3.org>
Message-Id: <4.2.0.58.J.20000410130601.033a7630@sh.w3.mag.keio.ac.jp>
At 00/04/08 08:38 -0500, Dan Connolly wrote:
>"Martin J. Duerst" wrote:
> >
> > At 00/04/07 18:09 -0500, Dan Connolly wrote:
> >
> > >Perhaps. But perhaps the shortest path to the target is to cut
> > >out the namespace stuff and character model stuff out of the
> > >c14n algorithm. Rewriting namespace prefixes causes
> > >all sorts of headaches:
> > >
> > >         "I hate to say that I told you so, but... -Tim"
> > >         -- Tim Bray
> > >         Re: c14n messes up qnames in attribute values
> >
> > Yes, but the real problem here is the spread of qnames
> > all over the place, not the c14n algorithm. Using qnames
> > instead of URIs replaces a universal identifier that can
> > be treated independently anywhere by something that is
> > very fragile because it depends on an indirection, on
> > additional information, and on very complex rules for
> > how to find the actual URI. Qnames are dangerous, and
> > the longer we go, the more we will find out.
>
>Huh? The use of name/value bindings in a language is fragile and
>dangerous? These are bindings whithin one syntactic
>document. I think you're just spreading fear, uncertainty
>and doubt. There's nothing especially complex or
>fragile about the algorithm for finding URIs
>from qnames: just find the nearest-enclosing
>element with a matching declaration.

It's not the algorithm for finding URIs for plain
Element or Attribute qnames that is difficult. In that
case, what's a qname is very well defined and very
easily identified, and how to find the corresponding URI
is equally well defined.

But that's not what I'm taking about. It's the use of
qnames in attribute values, in XPath, and so on, which
is frightening me. In these cases, what is a qname is
not at all globally defined, you need additional knowledge.
Also, how to find the relevant URI may depend on a lot of
things inside and outside the attribute. And almost every
spec that tries to use qnames invents another way to find
the URI. All of this is done with a side glance to the
Namespace spec, but is not at all based on the Namespace
spec. So my claim is that Qnames don't easily scale to wider
use than Element/Attribute names.


>We might consider revising the c14n algorithm to say
>"rewrite qnames in attribute values too" but how does
>one find qnames in attribute values in general? e.g.
>         <aDoc blort="foo:blort"/>
>Did the author of that document intend foo:blort as a qname,
>a URI using an as-yet-unregistered scheme, or just a string?
>There's no way to know, in general.

Yes, exactly.


>On the other hand, it is clear how to tell producers
>of XML documents to declare their namespaces in
>a normalized way; if somebody told me, the guy
>who knows the semantics of my stylesheet document,
>that it has to be in normal form, I can rewrite
>the XPath expressions and such:
>
><n1:stylesheet
>     xmlns:n1      ="http://www.w3.org/1999/XSL/Transform" version="1.0">
><n1:template
>     xmlns:n1      ="http://www.w3.org/1999/XSL/Transform"
>     xmlns:n2    ="http://www.w3.org/1999/xhtml"
>
>  match="n2:table//n2:tr/n2:td[string-length(normalize-space(.))&gt;1]">
>...
></n1:stylesheet>
>
>
>So specifying a way to *test* whether an XML document's namespace
>declarations are in normal form is straightforward,

I see your point.


>but specifying
>a transformation from general form to normal form has specific,
>identifyable, known problems: it silently changes the semantics
>of documents that conform to W3C Recommendations.

Yes, it does. Other transforms allowed by other W3C Recommendations
(in particular the Namespace Recommendation) already do the same.


>OK, so I've given detailed explanation of the problems of combining
>a (W3C Recommended) use of qnames with the (draft) c14n spec.
>I don't think the use of qnames by themselves have this sort
>of problem. But if they do, please provide details.

Well, c14 is not the only case where we expect rewriting to happen.

> > >         From: Tim Bray (tbray@textuality.com)
> > >         Date: Mon, Mar 20 2000
> > >http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2 
> 000M
> > >ar/0004.html
> > >
> > >And I maintain that character normalization is orthogonal to
> > >element-and-attribute c14n.
> >
> > As I have explained in a mail to the XML core WG, that's not
> > exactly the case. But based on new insights, the I18N WG/IG
> > has already made clear that in particular for digital signatures,
> > xml canonicalization and character normalization should be
> > considered separately.
>
>I don't understand... you say it's "not exactly the case"
>that "character normalization is orthogonal to
>element-and-attribute c14n" and yet "xml canonicalization
>and character normalization should be considered
>separately"; that's a direct contradiction, no?
>Please elaborate with details.

I sent these details to the XML Core WG quite a while ago.
For those who can, please see
http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2000JanMar/0422.html.

For the others, here is the relevant excerpt:

1) In terms of implementation, XML Canonicalization and
    text normalization are not fully orthogonal:
    Attribute ordering is by codepoint order, and normalization
    may change that. So you have to use NFC before CXML.
    But on the other hand, your normalization software may not
    take care of numeric character references. In this case,
    you have to use CXML before NFC.
    So overall, you may have to use NFC twice.
    It may be worth pointing this out in a note in the
    Canonical XML spec.

[NFC stands for normalization form C]


So c14n and NFC are orthogonal if you look just at the result,
but not when you have to actually implement them.


>As I detailed above, it's quite different to say
>         your stylesheet is not in namespace-normal form
>than to say
>         here's the canonical version of your stylesheet; beware
>                 that we have silently changed it from a conforming
>                 XSLT stylesheet to somethig with broken XPaths.
>
>Data integrity is job 1.

Well, we could call that 'early [xml] canonicalization', then.
Given that i18n is working towards early [text] normalization,
that makes quite some sense to me. The main difference, however,
is that NFC was designed to be as close as possible to existing
practice, whereas for c14n, that wasn't attempted nor would that
be very easy, if possible at all.

So what it may turn out to in the end is to ask people to
write their documents in the lengthy way that namespace-normal
form requires, with two consequences:

- Nobody that ever sees it in source form will like it,
   and tools won't have an easy job to generate it (if they
   had, we wouldn't be at this place in the discussion).

- If everybody is going to use it, it distroys some of the
   properties of the original namespace/qname design, namely
   scoping and in particular shortness.


> > >And the same goes for character normalization.
> >
> > Yes, having a name for the thing, and explaining why and
> > where it may be important, is a good idea.
> >
> > >Perhaps DSig would require its input to be in character-normal
> > >form to avoid the case of a user being unable to see
> > >birthday-attack changes between o-umlaut-precomposed and
> > >o-umlaut-decomposed.
> >
> > I don't understand what you mean by 'birthday-attack',

>In cryptography, it relates to substituting a forgery
>that has the same hash value as the original. In theory,
>an attacker might find a forgery that is different only in
>(a) the the precomposed vs. decomposed characters
>and (b) the dollar amount on the check.

That's one thing we have discussed, but we didn't know
that it is called 'birthday-attack'.


> > but
> > this is essentially what the I18N WG/IG is asking
> > XMLDSIG to do.
>
>You're asking now that XML DSig require input to digital
>signatures to be in character-normal form, or you have
>asked previously?

That was what we asked for in the last call. See
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JanMar/0254.html,
and search for
Canonicalization and Text Normalization
---------------------------------------


>I understood your previous communications
>to request that the DSig WG require the signing algorithm to do
>character normalization of its input, not to exclude unnormalized
>documents from its input.

Well, we thought about asking for that, but various security
considerations documented in our last call comments led to
a change. Some of the considerations are much more serious
than birthday attacks.


Regards,    Martin.
Received on Monday, 10 April 2000 04:20:14 UTC