Re: XML Signature use of Canonical XML from Dan Connolly on 2000-04-08 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: Dan Connolly <connolly@w3.org>
Date: Sat, 08 Apr 2000 08:38:34 -0500
To: "Martin J. Duerst" <duerst@w3.org>
CC: Ed Simon <ed.simon@entrust.com>, "'w3c-xml-core-wg@w3.org'" <w3c-xml-core-wg@w3.org>, "'w3c-ietf-xmldsig@w3.org'" <w3c-ietf-xmldsig@w3.org>
Message-ID: <38EF365A.D4BD5264@w3.org>
"Martin J. Duerst" wrote:
> 
> At 00/04/07 18:09 -0500, Dan Connolly wrote:
> 
> >Perhaps. But perhaps the shortest path to the target is to cut
> >out the namespace stuff and character model stuff out of the
> >c14n algorithm. Rewriting namespace prefixes causes
> >all sorts of headaches:
> >
> >         "I hate to say that I told you so, but... -Tim"
> >         -- Tim Bray
> >         Re: c14n messes up qnames in attribute values
> 
> Yes, but the real problem here is the spread of qnames
> all over the place, not the c14n algorithm. Using qnames
> instead of URIs replaces a universal identifier that can
> be treated independently anywhere by something that is
> very fragile because it depends on an indirection, on
> additional information, and on very complex rules for
> how to find the actual URI. Qnames are dangerous, and
> the longer we go, the more we will find out.

Huh? The use of name/value bindings in a language is fragile and
dangerous? These are bindings whithin one syntactic
document. I think you're just spreading fear, uncertainty
and doubt. There's nothing especially complex or
fragile about the algorithm for finding URIs
from qnames: just find the nearest-enclosing
element with a matching declaration.

> So it's not rewriting namespaces that causes problems,
> it's the unrestricted use of qnames by people who don't
> understand the consequences that is the problem.

I can point to specific problems caused by rewriting
namespace prefixes: consider the following parts
of a stylesheet I'm working on:

<stylesheet
    xmlns      ="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:h    ="http://www.w3.org/1999/xhtml">
<template
match="h:table//h:tr/h:td[string-length(normalize-space(.))&gt;1]">
...
</stylesheet>

The meaning of the qnames in the match attribute is clear in
the global context: h:tr is the table row element of XHTML,
template is per the W3C XSLT specification, and so on.

But if I canonicalize that document per the current c14n spec,
the qnames in the match attribute value will lose their bindings.

We might consider revising the c14n algorithm to say
"rewrite qnames in attribute values too" but how does
one find qnames in attribute values in general? e.g.
	<aDoc blort="foo:blort"/>
Did the author of that document intend foo:blort as a qname,
a URI using an as-yet-unregistered scheme, or just a string?
There's no way to know, in general.

On the other hand, it is clear how to tell producers
of XML documents to declare their namespaces in
a normalized way; if somebody told me, the guy
who knows the semantics of my stylesheet document,
that it has to be in normal form, I can rewrite
the XPath expressions and such:

<n1:stylesheet
    xmlns:n1      ="http://www.w3.org/1999/XSL/Transform" version="1.0">
<n1:template
    xmlns:n1      ="http://www.w3.org/1999/XSL/Transform"
    xmlns:n2    ="http://www.w3.org/1999/xhtml"

 match="n2:table//n2:tr/n2:td[string-length(normalize-space(.))&gt;1]">
...
</n1:stylesheet>


So specifying a way to *test* whether an XML document's namespace
declarations are in normal form is straightforward, but specifying
a transformation from general form to normal form has specific,
identifyable, known problems: it silently changes the semantics
of documents that conform to W3C Recommendations.

OK, so I've given detailed explanation of the problems of combining
a (W3C Recommended) use of qnames with the (draft) c14n spec.
I don't think the use of qnames by themselves have this sort
of problem. But if they do, please provide details.


> >         From: Tim Bray (tbray@textuality.com)
> >         Date: Mon, Mar 20 2000
> >http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2000M
> >ar/0004.html
> >
> >And I maintain that character normalization is orthogonal to
> >element-and-attribute c14n.
> 
> As I have explained in a mail to the XML core WG, that's not
> exactly the case. But based on new insights, the I18N WG/IG
> has already made clear that in particular for digital signatures,
> xml canonicalization and character normalization should be
> considered separately.

I don't understand... you say it's "not exactly the case"
that "character normalization is orthogonal to
element-and-attribute c14n" and yet "xml canonicalization
and character normalization should be considered
separately"; that's a direct contradiction, no?
Please elaborate with details.

> >It was suggested to me (by Noah Mendelsohn) that we could take
> >namespace prefix munging out of the c14n algorithm, but document
> >a "namespace normalized form" as an appendix or something; this
> >appendix wouldn't specify an algorithm with inputs and ouputs,
> >but rather just a test/constraint on documents ala
> >
> >         A document is in namespace-normal form iff...
> 
> Which way to specify (procedural or as conditions on the result)
> is rather independent of what to specify. The current canonicalization
> algorithm is already rather non-procedural.

As I detailed above, it's quite different to say
	your stylesheet is not in namespace-normal form
than to say
	here's the canonical version of your stylesheet; beware
		that we have silently changed it from a conforming
		XSLT stylesheet to somethig with broken XPaths.

Data integrity is job 1.

> >And the same goes for character normalization.
> 
> Yes, having a name for the thing, and explaining why and
> where it may be important, is a good idea.
> 
> >Perhaps DSig would require its input to be in character-normal
> >form to avoid the case of a user being unable to see
> >birthday-attack changes between o-umlaut-precomposed and
> >o-umlaut-decomposed.
> 
> I don't understand what you mean by 'birthday-attack',

It relates to a sort of parlor game:
	A: I bet there are two people in the room with the
	same birthday.

	B: no way! what are the odds of that?!?!?! You're on!

	A: Attention everyone! How many birthdays in January?
	What dates? I see... and February? Dates? All different...
	March? Dates? What was that? Fred and Joan
	were both born 13 March? Thank you.

	A: in fact, if there are more than 30 people in
	the room, the odds are in my favor.

In cryptography, it relates to substituting a forgery
that has the same hash value as the original. In theory,
an attacker might find a forgery that is different only in
(a) the the precomposed vs. decomposed characters
and (b) the dollar amount on the check.

> but
> this is essentially what the I18N WG/IG is asking
> XMLDSIG to do.

You're asking now that XML DSig require input to digital
signatures to be in character-normal form, or you have
asked previously? I understood your previous communications
to request that the DSig WG require the signing algorithm to do
character normalization of its input, not to exclude unnormalized
documents from its input.

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
pager (put return tel# in From or Subject field)
mailto:connolly.pager@w3.org
Received on Saturday, 8 April 2000 09:45:59 UTC