Re: XPath filter 2.0

Hi Merlin,

First, I don't think that it's "wasteful" to propose yet another option. It 
shouldn't be too late for good ideas.

1: Current spec:
  - I agree that the three operations (union/intersect/subtract) in
    general are intuitive, but (for me), it's absolutely non-intuitive
    that I can use union to include nodes which have not been in the
    input node set (even if it's /possible/ from the XML Signature spec).
  - One thing that makes all this a little bit confusing is that e.g. a
    union(/) as last step destroys all the selection work from previous
    steps because it includes everything again.

2: My spec:
  - Well, of course, for me, it's intuitive ;-))
  - It has a nearly constant runtime

3: Merlins xfilter v2.1
  - If I understood it right, you select the same nodes from the input
    *document* as in v2.0, but then you do not output the result as
    result node but use it as intersection mask with the input node set.
    Multiple 'steps' can be combined in a single ds:Transform
  - That's good, because it does what a /filter/ does. It cannot re-include
    nodes which have already been excluded in a previous ds:Transform.

  - As a result: I REALLY like it -- but I do not understand the following:

>     * Implementations SHOULD note that an efficient realization
>       of this transform will not compute a node set at each
>       point in this transform, but instead make a list of
>       the operations and the XPath-selected nodes, and then
>       iterate through the input document once, in document
>       order, constructing a filtering node set N based upon
>       the operations and selected nodes.
>   .......
>   . Efficiency is as with the current spec. Basically this
>     fixes union.

Does this mean that based on the operations, you make some tree-labeling 
;-)) and then you make one tree-traversal to output the selected nodes? 
Sounds cool, in that case, the efficiency would be much better then the 
current results of v2.0. What kind of algorithm do you use for

 - make a list of operations and selected nodes,
 - decide based on this data which nodes are the result.


Christian




--On Donnerstag, 6. Juni 2002 01:09 +0100 merlin <merlin@baltimore.ie> 
wrote:

>
>
> Hi,
>
> Quick summary of options:
>
> 1. Current Spec
>   . This is intuitive (in my opinion) because it is based on a
>     linear sequence of set operations.
>   . Typical (IMHO) use cases require 2 XPath evaluations.
>     However, increasingly complex filtering requirements incur
>     increasing cost; an arbitrarily complex expression requires
>     an arbitrarily large number of simple XPath expressions.
>     However, the standard XPath filter may be more useful for
>     these anyway.
>   . Operation can, in most cases, be commingled with c14n for
>     efficiency, but:
>   . The union operator is really ugly and unintuitive.
>
> 2. Christian's Spec
>   . *I* do not believe this is as intuitive; it involves labeling
>     nodes and then traversing the document, proceeding based
>     on node labels (e.g., omit-but-traverse).
>   . Typical use cases require 2 XPath evaluations. Increasingly
>     complex filtering requirements can be solved in a fixed
>     number (2/3) of increasingly complex XPath expressions.
>   . Operation can be commingled with c14n for effiency.
>
> 3. Or, we can take a variant of the current spec. I won't
>    detail it horrendously, but basically:
>
>   . The XPath Filter 2.1 takes, as a parameter, a sequence
>     of operations, each of which is characterized as a
>     set operation (intersect, subtract, union) and an
>     XPath expression.
>   . Operation over an input node set is as follows:
>     * Construct a node set N consisting of all the
>       nodes in the input document.
>     * Iterate through each of the operations.
>       # Evaluate the XPath expression; the result is X.
>       # Expand all identified nodes to include their
>         subtrees; the result is Y.
>       # Assign N = N op Y
>     * Use the resulting node set N as a filter to select
>       which nodes from the input node set will remain in the
>       output node set, just as the XPath 1.0 filter. This is
>       tantamount to intersection with the input node set.
>     * Implementations SHOULD note that an efficient realization
>       of this transform will not compute a node set at each
>       point in this transform, but instead make a list of
>       the operations and the XPath-selected nodes, and then
>       iterate through the input document once, in document
>       order, constructing a filtering node set N based upon
>       the operations and selected nodes.
>     * Implementations SHOULD note that iterating through the
>       document and constructing a filtering node set N can
>       be efficiently commingled with the canonicalization
>       transform if canonicalization is performed immediately
>       after this transform.
>   . With this formulation, intersection and subtraction
>     are IDENTICAL to the existing spec, with the only
>     change being that you can put them in one transform
>     or many.
>   . Union is, however, much improved (in my opinion). You
>     can only use it to include nodes that would be
>     removed by a previous operation in the same transform.
>     As a result, the output node set will only include
>     nodes from the input node set.
>   . Efficiency is as with the current spec. Basically this
>     fixes union.
>
> I write this a while ago; thought I'd send it rather
> than delete it. It's probably wasteful to propose yet
> another option.
>
> Merlin

Received on Thursday, 6 June 2002 04:24:18 UTC