Re: New Version of XPath Filter from Aleksey Sanin on 2002-04-08 (w3c-ietf-xmldsig@w3.org from April to June 2002)

From: Aleksey Sanin <aleksey@aleksey.com>
Date: Mon, 08 Apr 2002 15:00:28 -0700
To: John Boyer <JBoyer@PureEdge.com>
CC: merlin <merlin@baltimore.ie>, reagle@w3.org, w3c-ietf-xmldsig@w3.org
Message-ID: <3CB212FC.1040009@aleksey.com>
John,

Thanks for your message. After reading it I think I understand your 
position and
why this S' scares me. You are absolutelly right that the current XPath 
1.0 transform
requires checking every node in the input nodes set and excluding a node 
if the XPath
expression is evaluated to "false". However, you can describe the same 
operation in
a different way:
    apply XPath expression to the input document and calculate the 
result nodes set and
    input nodes set intersection
(of course, XPath expression is applied after adding "(//. | //@* | 
//namespace::*)[XXX]" )
In this interpretation, the XPath 1.0 transform performance is *the 
same* as the
proposed XPath 2.0 transform performance and there is no complicated 
optimization!

Also if you think about the current XPath transform in this way then you 
can imagine
that the only thing you want to add to this transform is to make it 
general and have
"intersect" , "union" and "subtract" operations instead of "intersect" 
only. It seems natural
and simple to me and I can explain the transform in one phrase (someone 
said that if
you could not explain your theory using one phrase then probably you 
have to think
more about it :) ).

However, the proposed XPath 2.0 transform is different because of this 
additional step.
I could not really evaluate the performance difference of adding this 
step. As you've noted
the XPath expressions are different and implementations are different. 
For some implementations
it could be faster to add all childs in the XPath expression itself, for 
some implementations
it could be faster to add childs using this additional step. And real 
performance will depend on
what will be most common situation: application wants to include/exclude 
only the node or
the node plus all childs. From my point of view, in such complicated 
situations a nice old
"Occam's Razor" rule should work: if the XPath expression evaluator can 
do the job why
we need to add this functionality to XPath transform?


Aleksey.



John Boyer wrote:

>Hi Aleksey,
>
>-----Original Message-----
>From: Aleksey Sanin [mailto:aleksey@aleksey.com]
>Sent: Monday, April 08, 2002 12:03 PM
>To: John Boyer
>Cc: merlin; reagle@w3.org; w3c-ietf-xmldsig@w3.org
>Subject: Re: New Version of XPath Filter
>
>
>I need to think about profiling this scenarios. Probably you are right 
>about performance.
>However, personally I don't think that it's a good idea to write general
>
>standard with
>performance as a top priority. My expirience shows that performance 
>improvements are
>usually very complicated, not simple to explain and very difficult to 
>maintain.
>
><jb>
>Actually, Xpath filter v1.0 is what resulted from trying to write the
>standard without concern for speed.  We tried it this way and found that
>the right idea has a performance profile that is quite unacceptable.
>So, we have no other goal than to achieve roughly the same functionality
>in a more efficient manner.  My experience also shows that performance
>improvements are complicated and difficult to maintain **in an
>implementation**.  XPath filter v1.0 could be optimized, possibly to an
>acceptable level of performance, but only by means that are very
>difficult to implement and therefore unlikely to be implemented
>uniformly.  Because the performance profile can become an
>interoperability issue in this case, it has become necessary to create
>an architecture in which the default non-optimized version achieves
>acceptable performance.
></jb>
>
>Calculating S' is one additional step and this step adds complexety to 
>the proposed scheme.
>
><jb>
>No, it is not an additional step in the sense of performance.  It only
>appears that way because we separate it out as an operation in the
>processing model.  However, any XPath evaluation that results in a
>node-set containing S' plus all descendants of nodes in S' has by
>definition performed at least as much work as it takes to compute S'.
></jb>
>
>I like the initial idea: describe XPath transform in terms of sets 
>operations. It really simplifies
>understanding and makes everything clear. I suggest you to take a look 
>at this from a "user"
>point of view: "I've constructed XPath expression but transform added 
>some nodes to it. Why???"
>And remember that users never read docs :)
>
><jb>
>We add nodes because that is the definition of what we do.  It is
>unreasonable to expect that a person will successfully construct a
>signature filter without knowing something as basic as what nodesets
>will result.  This is why we *require* familiarity with companion specs.
>This is also why every spec example shows only the subtree root being
>selected.
>
>For what it's worth, though, even a user like the one you describe would
>have difficulty messing up this transform.  Suppose the transform
>expression identifies all subtree nodes, not just the subtree roots.
>The behavior of the transform would not change.
>
>The user would have to write an XPath which tries to include some nodes
>then exclude some of their children in the same expression.  The idea
>that the user would try to do this in a text editor rather than a
>helpful design environment AND that the user would not read a single
>example-- even if it is after the first transform does not do what
>he/she wants-- seems to be creating quite the hypothetical user indeed.
></jb>
>
>And I think you agree with me that your solution for the second scenario
>
>I described (when only
>node "b" is required) has worse performance then a solution w/o S'.
>
><jb>
>I'm afraid I wouldn't agree with this either.  My current belief is that
>running one XPath of the type you describe will run far slower than the
>two transforms.  XPath has a problem:  it is easy to select nodes, it is
>hard to exclude nodes.  We are trying to overcome that problem by using
>Xpath's natural ability for identifying nodes to identify nodes we want
>to get rid of, then let the semantics of the host language (DSig in this
>case) do the exclusion because it can be done efficiently.
>
>Rather than considering the evaluation of every XPath expression as
>equal, consider the fact that some Xpath expressions can be far more
>than twice as costly as other ones.
>
>Cheers, 
>John Boyer
></jb>
>
Received on Monday, 8 April 2002 18:01:28 UTC