RE: New Version of XPath Filter from John Boyer on 2002-04-08 (w3c-ietf-xmldsig@w3.org from April to June 2002)

From: John Boyer <JBoyer@PureEdge.com>
Date: Mon, 8 Apr 2002 16:36:34 -0700
To: <aleksey@aleksey.com>
Cc: "merlin" <merlin@baltimore.ie>, <reagle@w3.org>, <w3c-ietf-xmldsig@w3.org>
Message-ID: <7874BFCCD289A645B5CE3935769F0B5232854F@tigger.PureEdge.com>
Hi Aleksey,

<as>
	<snip/>I think I understand your position and why this S' scares me. 
	<snip/>However, you can describe the same operation in a different way:
		apply XPath expression to the input document and calculate the result nodes set and input nodes set
		intersection (of course, XPath expression is applied after adding "(//. | //@* | //namespace::*)[XXX]" )

	<snip/> I can explain the transform in one phrase (someone said that if you could not explain your theory using one
		phrase then probably you have to think more about it :) ).
</as>

<jb>
The phrase is actually an imperative sentence, so my use of a declarative sentence should not be too offensive.

<sentence>The XPath Filter v2.0 performs standard set operations on an input node-set and a node-set containing the nodes of all subtrees whose roots are identified by a given XPath expression.</sentence>
</jb>

<as>
I could not really evaluate the performance difference of adding this step. As you've noted the XPath expressions are different and implementations are different. For some implementations it could be faster to add all childs in the XPath expression itself, for some implementations it could be faster to add childs using this additional step. And real performance will depend on what will be most common situation: application wants to include/exclude only the node or the node plus all childs. 
</as>

<jb>
The essence of my argument is that you can indeed evaluate the performance difference and, while holding XPath implementation constant, it is easy to see that less work is being done in the scenarios that we feel are most likely to occur.

For each subtree root, the method we are describing does work that is a constant factor of the size of the subtree to expand the root node into the complete set of nodes in the subtree.  Nothing can be simpler than simple tree traversal.

As I also identified in my prior email, XPath has a bit of difficulty identifying subtrees of nodes efficiently (it depends on the host language's semantics for that type of operation, e.g. see XSLT).  The method you are describing would require an XPath expression that, for a given subtree, would have to run the ancestor chain of each subtree node to see whether the subtree root is among the ancestors.  For a subtree root at depth D, the number of times the expression that identifies the subtree root runs is bounded from below by D times the subtree size.
</jb>

<as>
From my point of view, in such complicated situations a nice old "Occam's Razor" rule should work: if the XPath expression evaluator can do the job why we need to add this functionality to XPath transform?
</as>

<jb>
By the same rule, why not just use XPath filter v1.0.  It does the same job.  Answer:  Not fast enough.  

In this case, we know a method that will encourage implementers to make faster implementations whilst simultaneously creating a notation that is appealing to those who are used to thinking about the XPaths that appear in XSLT.  In XSLT, the Xpath node-set is typically handled as if the nodes were subtree roots.

John Boyer
</jb>

Aleksey.



John Boyer wrote:

>Hi Aleksey,
>
>-----Original Message-----
>From: Aleksey Sanin [mailto:aleksey@aleksey.com]
>Sent: Monday, April 08, 2002 12:03 PM
>To: John Boyer
>Cc: merlin; reagle@w3.org; w3c-ietf-xmldsig@w3.org
>Subject: Re: New Version of XPath Filter
>
>
>I need to think about profiling this scenarios. Probably you are right 
>about performance.
>However, personally I don't think that it's a good idea to write general
>
>standard with
>performance as a top priority. My expirience shows that performance 
>improvements are
>usually very complicated, not simple to explain and very difficult to 
>maintain.
>
><jb>
>Actually, Xpath filter v1.0 is what resulted from trying to write the
>standard without concern for speed.  We tried it this way and found that
>the right idea has a performance profile that is quite unacceptable.
>So, we have no other goal than to achieve roughly the same functionality
>in a more efficient manner.  My experience also shows that performance
>improvements are complicated and difficult to maintain **in an
>implementation**.  XPath filter v1.0 could be optimized, possibly to an
>acceptable level of performance, but only by means that are very
>difficult to implement and therefore unlikely to be implemented
>uniformly.  Because the performance profile can become an
>interoperability issue in this case, it has become necessary to create
>an architecture in which the default non-optimized version achieves
>acceptable performance.
></jb>
>
>Calculating S' is one additional step and this step adds complexety to 
>the proposed scheme.
>
><jb>
>No, it is not an additional step in the sense of performance.  It only
>appears that way because we separate it out as an operation in the
>processing model.  However, any XPath evaluation that results in a
>node-set containing S' plus all descendants of nodes in S' has by
>definition performed at least as much work as it takes to compute S'.
></jb>
>
>I like the initial idea: describe XPath transform in terms of sets 
>operations. It really simplifies
>understanding and makes everything clear. I suggest you to take a look 
>at this from a "user"
>point of view: "I've constructed XPath expression but transform added 
>some nodes to it. Why???"
>And remember that users never read docs :)
>
><jb>
>We add nodes because that is the definition of what we do.  It is
>unreasonable to expect that a person will successfully construct a
>signature filter without knowing something as basic as what nodesets
>will result.  This is why we *require* familiarity with companion specs.
>This is also why every spec example shows only the subtree root being
>selected.
>
>For what it's worth, though, even a user like the one you describe would
>have difficulty messing up this transform.  Suppose the transform
>expression identifies all subtree nodes, not just the subtree roots.
>The behavior of the transform would not change.
>
>The user would have to write an XPath which tries to include some nodes
>then exclude some of their children in the same expression.  The idea
>that the user would try to do this in a text editor rather than a
>helpful design environment AND that the user would not read a single
>example-- even if it is after the first transform does not do what
>he/she wants-- seems to be creating quite the hypothetical user indeed.
></jb>
>
>And I think you agree with me that your solution for the second scenario
>
>I described (when only
>node "b" is required) has worse performance then a solution w/o S'.
>
><jb>
>I'm afraid I wouldn't agree with this either.  My current belief is that
>running one XPath of the type you describe will run far slower than the
>two transforms.  XPath has a problem:  it is easy to select nodes, it is
>hard to exclude nodes.  We are trying to overcome that problem by using
>Xpath's natural ability for identifying nodes to identify nodes we want
>to get rid of, then let the semantics of the host language (DSig in this
>case) do the exclusion because it can be done efficiently.
>
>Rather than considering the evaluation of every XPath expression as
>equal, consider the fact that some Xpath expressions can be far more
>than twice as costly as other ones.
>
>Cheers, 
>John Boyer
></jb>
>
Received on Monday, 8 April 2002 19:37:07 UTC