RE: History: Question on C14N list of nodes instead of subtrees from John Boyer on 2002-01-25 (w3c-ietf-xmldsig@w3.org from January to March 2002)

From: John Boyer <JBoyer@PureEdge.com>
Date: Thu, 24 Jan 2002 16:12:13 -0800
To: "merlin" <merlin@baltimore.ie>
Cc: <reagle@w3.org>, <w3c-ietf-xmldsig@w3.org>
Message-ID: <7874BFCCD289A645B5CE3935769F0B52350083@tigger.PureEdge.com>

Hi Merlin,

<merlin>
Right, I was talking in terms of a "mathematically average"
document, whatever that means. FWIW, a "simple" signature
has 21 nodes and depth 6.
</merlin>

<jb>
True, but if you look at the rest of the document over which the
signature is applied, it almost never gets much deeper no matter how
much bigger the document gets.  But see below...
</jb>

<merlin>
The problem is that XMLDSIG does not utilise XPath in this
manner; it uses the XPath expression as a boolean test for
each node in the input set, so the expression above will
evaluate to true for every node in the document (presuming
there exists an element of such ID).
</merlin>

<jb>
Yikes! Brain seizure.  I somehow managed to forget that we changed at
some point (long ago) from an Xpath expression to something that worked
a little more like XSLT and was simpler (i.e. avoided the explicit
subexpression (//. | //@* | //namespace::*)).

On the bright side, I feel a little less wigged out about why I never
suggested Don's expression (which seems much more like something one
would come up with than the seems-like-an-act-of-desperation ancestor
counting scheme).
</jb>

<merlin>
If we formulate a new XMLDSIG XPath intersection transform
then we can use these expressions to solve some problems
more efficiently.
</merlin>

<jb>
I'm not averse to more transforms; it would be useful to get an idea for
how much motivation there might be to make changes at this point.  My
position is that 

	if the problem is important enough (e.g. good idea but too slow
in a frequently occuring scenario), and
	if it is easy for the implementations to be tweaked, 
	then let's fix it before REC.

So, before we stop the presses and re-architect anything, it would be
useful to find out exactly how important the problem is.  By this I mean
the mundane question of how frequently occuring the scenario is, but
also the more interesting question of whether the slowness is really an
inherent limitation we are hitting with XPath or just an implementation
limitation.

Whether or not you buy my arguments about element depth, both O(n) and
O(n log n) processes experience an increase in required running time
that is 'close to' a factor of 2 if you double the input size.  If you
are indeed experiencing this effect, and the result is too slow, then I
would agree that the speed is more of a problem than we anticipated.
But, if you double the document size a couple of times and notice a 3 or
4 fold increase in running time at each pass, then the running time is
being dominated by the slowness of a suboptimal portion of the XPath
implementation.  It would be useful to demonstrate that iteratively
doubling the input results in roughly a two-fold increase in running
time (e.g. certainly < 2.4 and hopefully < 2.2 on large input).

John Boyer
PureEdge Solutions Inc.

</jb>

Received on Thursday, 24 January 2002 19:12:54 UTC