Re: History: Question on C14N list of nodes instead of subtrees

I've just had a chance to run some quick xpath document subsetting
tests (using xerces and the xalan xpath processor in a fairly obvious
manner, but I could be doing something nonoptimally):

Using the xmldsig XPath transform I transform a whole document
(URI="") through the following filter:
  <Transform Algorithm="&xpath;">

I ran this on three documents; the first (~20KB, quite flat):
  <Foo> ... <FooBarList>...</FooBarList> ... </Foo>

The second (twice the size, 1 element deeper):
  <Bar> <Foo> above.. </Foo> <Foo> above.. </Foo> </Bar>

The third (twice the size again, another element deeper):
  <Baz> <Bar> above.. </Bar> <Bar> above.. </Bar> </Baz>

The timing (in arbitrary units on a loaded machine, so take salt):
  2250 6177 20976

Using alternative xpath#include and xpath#exclude transforms:
  <Transform Algorithm="&xpath;#include">
      (//FooBarList//. | //FooBarList//@* | //FooBarList//namespace::*)

Timing on the same documents:
  1053 1538 2469

XPointer would be faster still (the expression being just //FooBarList,
IIRC). A hybridized XPath option &xpath;#include-subtree and
&xpath;#exclude-subtree could take the same expression.

It is unfortunately a bit late in the day for me to try a
here()-relative subset, which would be more interesting, as would be
independent numbers.


>Hi Joseph,
>Aside from it having REC status, I don't think it is a mistake for C14N
>to look at individual nodes of an XPath node-set, so changing that had
>not crossed my mind.
>I'm not in on the full conversation that's going on regarding some of
>these issues, but it sounds like some are finding speed issues when
>using an XPath transforms (though I wish it had been noted sooner
>considering the number of implementations, but better late than never).
>At least part of the problem is the current way we interpret the
>expression (e.g. as a PredicateExpr that we automatically apply to every
>node).  This seems to make things easier because everyone got to avoid
>writing (//. | //@* | //namespace::*) into virtually every expression.
>I had no aversion to this, but many felt it was cryptic.  Anyway,
>eliminating it seems to have gotten rid of the most efficient method of
>indicating a subtree.
>It did not seem that it would be a terrible speed problem to me because
>of my perceptions about limited depth of XML in most practical
>circumstances.  Moreover, perhaps it was good in that it put inclusive
>logic (include a subtree in signature) and exclusive logic (exclude a
>subtree from signature) on the same footing.
>However, if there is indeed a speed problem, then the XPath transform
>itself needs to be fixed.  I don't think I agree with creating another
>'intersect' transform for two reasons.  First, XPath is recommended, so
>we will continue to 'recommend' something that is inherently unusable
>for its stated purposes.  Second, what we need is more of a complement
>or subtract than an intersect.
>Finally, the 'stability' of the spec/implementations seems orthogonal to
>the issue at hand.  We have something that is approaching REC and
>recommends the use of an essentially unusable feature (we can't hang up
>servers for the length of time it sounds like this takes).  The fact
>that the near-REC hasn't changed in a while has no bearing on this, and
>I think we're obliged to fix things that we notice (albeit belatedly)
>are broken using our implementations (otherwise why require
>Best regards,
>John Boyer
>PureEdge Solutions Inc.
>-----Original Message-----
>From: Joseph Reagle []
>Sent: Friday, January 25, 2002 10:17 AM
>To: John Boyer; merlin
>Subject: Re: History: Question on C14N list of nodes instead of subtrees
>On Friday 25 January 2002 11:44, John Boyer wrote:
>> Joseph, do you have any opinion on the ability to tolerate a change at
>> this point?
>The question is a change in what? Canonical XML is already a REC and
>be changed. xmldsig will be a REC in a couple of weeks but that's been 
>stable for so long it can't change either. My goal has been that
>be faster for subtree processing.
>As an aside, Don and I had a exc-c14n -> CR review meeting yesterday and
>still run into problems of people less familiar with the work assuming
>there is such a thing as a one-and-only-one form of Canonical XML -- 
>perhaps we should've call them serializations 1 2 and 3 <smile/>. 
>Regardless, my goal is for exc-c14n to be faster with some
>optimizations. (We've actually made this an exit criteria of the CR [1]
>and it reference Merlin's earliest speed tests on exc-c14n [2]).
>For example, originally the python c14n *only* worked over a DOM node so
>that was good for sub-tree processing and we then worked on extending it
>support XPath expression. If there is an XPath expression you have a hit
>performing the XPath expression and walking the whole dom tree, checking
>each node (and building ancestor context) against the subset to see if
>should be omitted. However, if you only passed it a single DOM element 
>node (representing a sub-tree) it serializes that subtree and walks up
>ancestors looking for xml:* attributes. My expectation/goal in exc-c14n
>that you don't have to even walk up your ancestors since you don't care 
>about that context. (Our implementation would fail at this -- I think --
>because we don't have a proper XPath nodeset with namespace axis with
>necessary ancestor context; they are treated as attributes. But an 
>implementation with a proper namespace axis node need only serialize the
>subtree (doesn't care about ancestor xml:*, and if the namespace is 
>utilized then the prefix/value is right there in that elements namespace
>axis node -- it need not look at ancestors at all!) So I think we're
>to end up with decent subtree c14n with exc-c14n.
>It doesn't solve any performance problems with respect to arbitrary
>filtering and I'm not sure if anything proposed in this thread would fix
>that ...? (I might've missed it, did you agree that Merlin's
>expression would be better?) So it's to say hard to say how that would 
>affect existing specs beyond, again, xmldsig is extensible so it could
>done if necessary.
>(see greyed out status)
>Joseph Reagle Jr.       
>W3C Policy Analyst      
>IETF/W3C XML-Signature Co-Chair
>W3C XML Encryption Chair

Baltimore Technologies plc will not be liable for direct,  special,  indirect 
or consequential  damages  arising  from  alteration of  the contents of this
message by a third party or as a result of any virus being passed on.

This footnote confirms that this email message has been swept by
Baltimore MIMEsweeper for Content Security threats, including
computer viruses.

Received on Thursday, 31 January 2002 04:40:18 UTC