RE: New Version of XPath Filter from John Boyer on 2002-04-08 (w3c-ietf-xmldsig@w3.org from April to June 2002)

From: John Boyer <JBoyer@PureEdge.com>
Date: Mon, 8 Apr 2002 14:33:52 -0700
To: <aleksey@aleksey.com>
Cc: "merlin" <merlin@baltimore.ie>, <reagle@w3.org>, <w3c-ietf-xmldsig@w3.org>
Message-ID: <7874BFCCD289A645B5CE3935769F0B52328549@tigger.PureEdge.com>
Hi Aleksey,

-----Original Message-----
From: Aleksey Sanin [mailto:aleksey@aleksey.com]
Sent: Monday, April 08, 2002 12:03 PM
To: John Boyer
Cc: merlin; reagle@w3.org; w3c-ietf-xmldsig@w3.org
Subject: Re: New Version of XPath Filter


I need to think about profiling this scenarios. Probably you are right 
about performance.
However, personally I don't think that it's a good idea to write general

standard with
performance as a top priority. My expirience shows that performance 
improvements are
usually very complicated, not simple to explain and very difficult to 
maintain.

<jb>
Actually, Xpath filter v1.0 is what resulted from trying to write the
standard without concern for speed.  We tried it this way and found that
the right idea has a performance profile that is quite unacceptable.
So, we have no other goal than to achieve roughly the same functionality
in a more efficient manner.  My experience also shows that performance
improvements are complicated and difficult to maintain **in an
implementation**.  XPath filter v1.0 could be optimized, possibly to an
acceptable level of performance, but only by means that are very
difficult to implement and therefore unlikely to be implemented
uniformly.  Because the performance profile can become an
interoperability issue in this case, it has become necessary to create
an architecture in which the default non-optimized version achieves
acceptable performance.
</jb>

Calculating S' is one additional step and this step adds complexety to 
the proposed scheme.

<jb>
No, it is not an additional step in the sense of performance.  It only
appears that way because we separate it out as an operation in the
processing model.  However, any XPath evaluation that results in a
node-set containing S' plus all descendants of nodes in S' has by
definition performed at least as much work as it takes to compute S'.
</jb>

I like the initial idea: describe XPath transform in terms of sets 
operations. It really simplifies
understanding and makes everything clear. I suggest you to take a look 
at this from a "user"
point of view: "I've constructed XPath expression but transform added 
some nodes to it. Why???"
And remember that users never read docs :)

<jb>
We add nodes because that is the definition of what we do.  It is
unreasonable to expect that a person will successfully construct a
signature filter without knowing something as basic as what nodesets
will result.  This is why we *require* familiarity with companion specs.
This is also why every spec example shows only the subtree root being
selected.

For what it's worth, though, even a user like the one you describe would
have difficulty messing up this transform.  Suppose the transform
expression identifies all subtree nodes, not just the subtree roots.
The behavior of the transform would not change.

The user would have to write an XPath which tries to include some nodes
then exclude some of their children in the same expression.  The idea
that the user would try to do this in a text editor rather than a
helpful design environment AND that the user would not read a single
example-- even if it is after the first transform does not do what
he/she wants-- seems to be creating quite the hypothetical user indeed.
</jb>

And I think you agree with me that your solution for the second scenario

I described (when only
node "b" is required) has worse performance then a solution w/o S'.

<jb>
I'm afraid I wouldn't agree with this either.  My current belief is that
running one XPath of the type you describe will run far slower than the
two transforms.  XPath has a problem:  it is easy to select nodes, it is
hard to exclude nodes.  We are trying to overcome that problem by using
Xpath's natural ability for identifying nodes to identify nodes we want
to get rid of, then let the semantics of the host language (DSig in this
case) do the exclusion because it can be done efficiently.

Rather than considering the evaluation of every XPath expression as
equal, consider the fact that some Xpath expressions can be far more
than twice as costly as other ones.

Cheers, 
John Boyer
</jb>
Received on Monday, 8 April 2002 17:34:31 UTC