RE: History: Question on C14N list of nodes instead of subtrees from John Boyer on 2002-01-23 (w3c-ietf-xmldsig@w3.org from January to March 2002)

From: John Boyer <JBoyer@PureEdge.com>
Date: Wed, 23 Jan 2002 15:15:35 -0800
To: <reagle@w3.org>, "merlin" <merlin@baltimore.ie>
Cc: <w3c-ietf-xmldsig@w3.org>
Message-ID: <7874BFCCD289A645B5CE3935769F0B520C3505@tigger.PureEdge.com>
Hi all,

We 'flattened' the nodes because that is, in essence, what serialization
is doing.

The real question is why does every node have to be in a node-set in
order for it to appear in the serialization, i.e. why can't we just
refer to an entire subtree.

Before I answer, I would point out that XPointer is based on XPath, and
XPath has the same capability w.r.t. its possible use for indicating
subtrees rather than individual nodes (e.g. its use in XSLT, which does
not need all descendants because it can naturally recurse when
necessary).  In other words, XPath is a tool for indicating node-sets.
The node sets can be interpreted in any way necessary according to the
needs of the consuming application (they could be considered as the
roots of subtrees, individual nodes, or the heads of sublists of
siblings, ...)

Anyway, we selected to interpret the result of an XPath as indicating
individual nodes to be serialized.  This is legal under the
interpretation of the word "set".  Indeed, it is even possible to do
this with XPointer (location-set), and I advocated this at one point,
but the group decided against using XPointer only because we wanted a
shorter time to REC (though this was over a year ago!).

Aside from legality, the individual node interpretation of XPath results
was necesssary in order to implement the notion of signature filters,
which specify the portion of a document to be signed and/or the portion
to be omitted, the latter being inherently more secure.  I did an RSA
2000 presentation on this issue and numerous emails exist in the archive
about 'document closure'.  The basic idea is that if you specify the
portions of a document covered by a signature, then it is easy to add to
the document without breaking the signature.  It becomes very difficult
to assess whether the additions have an implicit impact on the signed
material.  By comparison, if a signature says "sign everything except X,
Y, Z" then additions other than X, Y and/or Z will break the signature.
It is comparatively easier to establish the security of a message if all
you have to prove is that X, Y and Z cannot cause a security problem
within the application domain (or that any such possibility is covered
by post-core-validation application-specific steps).  

In essence, we made easy things a little harder to say so that we could
make hard things possible to say.

Finally, as to the speed issue that the originator is experiencing, I'd
say he is suffering from a non-optimized implementation.  The reason is
that an O(n) traversal of the nodes of a parse tree is sufficient for
including all such nodes in a node-set.  Since all members of the
node-set will be rendered in a serialization, the cost is O(n).  Thus,
the cost of putting all nodes in a node-set should (or can at least) be
commensurate with serialization.  If it is taking progressively longer
to calculate the node-set as the XML document grows, then the XPath
implementation needs to be optimized.  Alternately, we only require
behavior equivalent to the evaluation of the (//. | //@* |
//namespace::*) expression.

John Boyer
PureEdge Solutions Inc.

-----Original Message-----
From: Joseph Reagle [mailto:reagle@w3.org]
Sent: Wednesday, January 23, 2002 12:28 PM
To: John Boyer; merlin
Cc: w3c-ietf-xmldsig@w3.org
Subject: History: Question on C14N list of nodes instead of subtrees


A bit of archeology ... I've been corresponding with someone about 
performance and dependence on XPtr recently and brought up message [1].
I 
characterized (perhaps mistakenly) the issue as follows, but this sort
of 
begs the question of why did we "burst/flatten" our nodelist in the
first 
place? Was it so we could easily create subsets as a list of nodes
(making 
the "node in subset test" easy?), so that we could easily remove
comments?

---- Summary Characterization ----

1. When the signature is in the document being signed (enveloped) it has
to 
except itself from the document: a signature can't contain itself
(paradox).
2. One approach is to use XPath to throw out all the dsig element from
the 
document and then sign it. BUT what if I'm an enveloped signature in a 
document that has *other* signatures that I want to sign. I need a way
to 
say "only throw out the Signatutre element containing *me*" There's no
easy 
way to do that so we invented the here() XPath function -- which is also
in 
XPtr now.
 http://www.w3.org/TR/xmldsig-core/#function-here
3. The example is someone noting that when someone takes the 
enveloped-signature-in-a-document and embeds *that* document into
another 
document, the here() expression and processing gets even heavier!
(4. Also note that in our use of XPath we have lists of XPath nodes 
(flattened/burst), not trees. So if we have an XPath element node
*that's* 
all it is. "because XPointer typically indicates a subtree of an XML 
document's parse tree using just the element node at the root of the 
subtree, whereas Canonical XML treats a node-set as a set of nodes in
which 
absence of descendant nodes results in absence of their representative
text 
from the canonical form." 
  http://www.w3.org/TR/xmldsig-core/#sec-Same-Document
)
5. The result is our expression gets really ugly (cause we need to
always 
ask for (//. | //@* | //namespace::*) and the processing from the XPath
is 
*very* slow: parsing the tree and doing counts. Merlin was indicating
that 
in XPointer, the expression would look  prettier (since it points to a 
subtree, not just a flat node list) and I'm wondering about the
performance 
of the XPointer here. Now that I think about it here() probably isn't 
something you'd want in a simple profile, but having it be there and
have 
it be implemented by fast implementations could've been of use to us --
or 
could in a future version of xmldsig....

----- Old Email ----

[1] 
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2001AprJun/0081.htm
l

Subject: Re: Problem: referring to a complete sub-tree using XPath
Date: Wed, 25 Apr 2001 09:48:53 +0100
From: merlin <merlin@baltimore.ie>
To: "Karl Scheibelhofer" <Karl.Scheibelhofer@iaik.at>
Cc: "XMLSigWG" <w3c-ietf-xmldsig@w3.org>

Hi Karl,

XPointer is the tool for this job, not XPath. It can do this
reference in a single path evaluation. The xmldsig standard
supports full same-document XPointer references; it is simply
an issue of implementing it ;}

Merlin

r/Karl.Scheibelhofer@iaik.at/2001.04.19/12:57:45

>Hi,
>
>i use XPath in a reference to select a element of the same document and
> all its descendants, attributes,... - simply the subtree with the
> specific element as its root.
>i already have a XPath that works. however, its awfully slow, because
its
>quite long for this simple task it perfoms.
>
>here a short example
>
><?xml version="1.0" encoding="UTF-8"?>
><aida:eDocument xmlns:aida="http://www.iaik.at/aida"
>xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
>xsi:schemaLocation="http://www.iaik.at/aida eDocument.xsd">
>  <aida:signedContent>
>    <personnel xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
>xsi:noNamespaceSchemaLocation="personal.xsd">
>      <person contr="false" id="Big.Boss">
>        <name>
>          <family>Boss</family>
>          <given>Big</given>
>        </name>
>        <email>chief@foo.com</email>
>        <link subordinates="one.worker two.worker three.worker
four.worker
>five.worker"/>
>      </person>
>       ... (omitted some data)
>    </personnel>
>  </aida:signedContent>
>  <dsig:Signature Id="eDocumentSignature-1"
>xmlns:dsig="http://www.w3.org/2000/09/xmldsig#">
>    <dsig:SignedInfo>
>      <dsig:CanonicalizationMethod
>Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000907"/>
>      <dsig:SignatureMethod
>Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
>      <dsig:Reference URI="">
>        <dsig:Transforms>
>          <dsig:Transform
>Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
>            <dsig:XPath xmlns:aida="http://www.iaik.at/aida"
>xmlns:dsig="http://www.w3.org/2000/09/xmldsig#">count((here()/ancestor:
:ai
>da
>
>:eDocument[1]/child::aida:signedContent[1]//. |
>
>here()/ancestor::aida:eDocument[1]/child::aida:signedContent[1]//@* |
>here()/ancestor::aida:eDocument[1]/child::aida:signedContent[1]//namesp
ace
>:: *) | self::node()) =
>count((here()/ancestor::aida:eDocument[1]/child::aida:signedContent[1]/
/.
> | here()/ancestor::aida:eDocument[1]/child::aida:signedContent[1]//@*
|
> here()/ancestor::aida:eDocument[1]/child::aida:signedContent[1]//names
pac
>e:: *))</dsig:XPath>
>          </dsig:Transform>
>        </dsig:Transforms>
>        <dsig:DigestMethod
>Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
>       
<dsig:DigestValue>ssbkbDM6VCUTYyzXMK06RKcbFHQ=</dsig:DigestValue>
>      </dsig:Reference>
>    </dsig:SignedInfo>
>
><dsig:SignatureValue>PFkUqjNCq9Ujyl/K/5c62vyEeExIVNFwrrH2ZuxLbmXjH9dQOF
rVw
>Po dMb1xUY1Y 
> 8iHpAcl8Z6xP3mMCK60ROtVCcDRS2v0ydULhJ+IZFjotIgwtGECy9lxZy4LDkeUJ
> RIvtzlDHBnp5jMb1+iLO1aTvkBzNLWbrAGo+rbqely4=</dsig:SignatureValue>
> <dsig:KeyInfo>
>      <dsig:X509Data>
>        <dsig:X509Certificate>MIIC .... (omitted some data)
></dsig:X509Certificate>
>      </dsig:X509Data>
>    </dsig:KeyInfo>
>  </dsig:Signature>
></aida:eDocument>
>
>i need the here() functionality to ensure that the signature even
> verifies, if i embed the whole document into another xml document.
>the long XPath the you see in the example just selects the
><aida:signedContent> element with everything contained within this
> element. does anyone know a simpler XPath that does the same job? the
> performance of this is unacceptable: up to some minutes if i have a
> medium XML document in the signed content running without JIT. (i use
> Xerces 1.3.0 [with some patches])
>i did not want to use IDs, to be able to arbitrary include signed
> documents into other documents.
>
>regards,
>
>  Karl Scheibelhofer
Received on Wednesday, 23 January 2002 18:16:12 UTC