Re: Schema Centric Canonicalization algorithm from Joseph Reagle on 2002-03-01 (w3c-ietf-xmldsig@w3.org from January to March 2002)

From: Joseph Reagle <reagle@w3.org>
Date: Fri, 1 Mar 2002 18:04:12 -0500
To: "Bob Atkinson" <bobatk@Exchange.Microsoft.com>, selim.aissi@intel.com, mhondo@us.ibm.com
Cc: "XML Signature" <w3c-ietf-xmldsig@w3.org>
Message-Id: <200203012304.SAA13642@tux.w3.org>
Thank you for your note on the availability of [1], linked from [a]. It's 
important that we review each others' work, ask questions, and point out 
errors -- when appropriate! <smile/> I say this because while I haven't had 
a chance to review the whole document there are some substantive points I 
can address immediately along those lines.

[a] http://www.uddi.org/bestpractices.html
[1] http://www.uddi.org/pubs/SchemaCentricCanonicalization-20020213.htm
Schema Centric XML Canonicalization Version 1.0 Working Draft.  13 February 
2002

>   1.1 Limitations of Existing Canonicalization Algorithms
>
>    The Exclusive XML Canonicalization suffers from the problem that no
>    means is provided therein by which the default XML namespace can be
>    listed in the InclusiveNamespacePrefix list parameter to the
>    algorithm, thus, inappropriately relegating that prefix to a
>    second-class status.

This was an error that was since remedied in the Candidate Recommendation:
   http://www.w3.org/TR/2002/CR-xml-exc-c14n-20020212
   This algorithm also
   takes an optional explicit parameter of an empty InclusiveNamespaces
   element with a PrefixList attribute. The value of this attribute,
   which may be null, is a whitespace delimited list of namespace
   prefixes, and where #default indicates the default namespace, to be
   handled as per [XML-C14N].

I'd hope that folks feel comfortable sending an email to this list saying, 
"hey, why did you drop the treatment of the default ns" rather than to have 
to create their own algorithm just on that note! <grin/>  But of course, 
you have some additional requirements.

>    Additionally, both of these algorithms (collectively "the existing
>    algorithms") share some characteristics which cause problems, some
>    considerable, to applications considering their use:
>     1. The presence of a DTD that validates the XML subdocument being
>        canonicalized is assumed. In particular, default attributes
>        specified in the DTD are included in the output of the
>        canonicalization process.
>        With the advent of XML Schema, it is in fact now increasingly rare
>        to find XML documents for which validation is accomplished using a
>        DTD, or, indeed, due to the weak expressiveness of DTDs, to find
>        XML documents for which a DTD which describes the content models
>        of the elements of the document (instead of merely defining
>        entities and the like) can in fact ever be constructed. Thus, the
>        existing algorithms are becoming less and less useful to practical
>        applications of XML.

Fair enough. I've anticipated this requirement (it's a no brainer!) and 
I'm happy to see someone addressing it.

>     2. Contrary to the intent of the Namespaces in XML Recommendation,
>        XML documents are not canonicalized with respect to the XML
>        namespace prefixes they use. That is, XML documents that are
>        identical except for their choice of namespace prefixes
>        canonicalize to different results under the existing algorithms.
>        Since namespace declarations can appear on any element, the need
>        for their preservation can at times be a very significant
>        implementation burden.

There's two sorts of people in the world, those that consider the prefixes 
important, and those that don't! Just as we reversed from an earlier draft 
and decided not to rewrite prefixes [2] when we inherited the spec, I can 
see other people considering the requirement you express above as being 
more important than those expressed in [2].

[2] http://www.w3.org/TR/xml-c14n#NoNSPrefixRewriting
  For example, an XPath expression in an attribute value or
  element content can reference a namespace prefix. Thus, rewriting
  the namespace prefixes would damage such a document by changing
  its meaning (and it cannot be logically equivalent if its meaning
  has changed).


>     3. Canonical XML contains a security hole having to do with how it
>        processes certain esoteric node-sets. Consider a node set which
>        consists of just a single attribute node, one that explicitly
>        references a namespace by use of a namespace prefix. While it is
>        true in Canonical XML that an element node that is not in the
>        node-set still has its namespace axis processed, the rule in
>        Canonical XML (see §2.3) for processing that namespace axis states
>        that only "namespace nodes in the axis and in the node-set"
>        (emphasis added) are in fact processed. Thus, the canonical
>        representation of our single-attribute-node node-set consists of
>        the processing of only the attribute node itself; no namespace
>        attributes are included. 

I personally haven't given a great deal of thought of 
canonicalizing/signing just an attribute. The spec certainly permits it but 
I  haven't played with it in implementations myself. Your use cases 
anticipate this usage? Is my following understanding correct?

<foo:list foo:xmlns="http://example.org/foo">
  <foo:item foo:price="5"/>
</foo:list>

And you want to canonicalize the price attribute? And the result you want 
is akin to (avoiding your prefix rewriting for the moment):
  'foo:price="5" foo:xmlns="http://example.org/foo"' 
whereas Canonical XML yields:
 'foo:price="5"'

?

>       Thus, two such single-attribute node-sets
>        whose attributes are character-wise identical but use completely
>        different namespaces as the binding of their prefix will
>        canonicalize to the same result, and that presents a security
>        hole, particularly in applications to digital signatures.
>        Analogous security holes exist with similar node-sets. Whether the
>        same security hole exists in Exclusive XML Canonicalization is
>        likely the case but is not completely clear, since there are at
>        present ambiguities in the specification thereof of which the
>        resolution will likely bear on the matter.

Could you be more specific about the ambiguities? We still have an 
opportunity to correct exclusive c14n *if* that is what is needed.

>     4. The goal of the existing canonicalization algorithms is to
>        canonicalize an XML subdocument with respect to the liberties of
>        its physical representation permitted within only the XML 1.0
>        Recommendation and the Namespaces in XML Recommendation.
>        The XML Schema Recommendation permits a considerable number of
>        additional liberties of representation, including (but not limited
>        to) the following:

Again, I agree, some folks do want these schema datatype normalizations.

-- 

Joseph Reagle Jr.                 http://www.w3.org/People/Reagle/
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/Signature/
W3C XML Encryption Chair          http://www.w3.org/Encryption/2001/
Received on Friday, 1 March 2002 18:04:25 UTC