Re: POLL: Fwd: Re: XPath filter 2.0 from Christian Geuer-Pollmann on 2002-06-10 (w3c-ietf-xmldsig@w3.org from April to June 2002)

From: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
Date: Mon, 10 Jun 2002 23:46:35 +0200
To: reagle@w3.org, merlin <merlin@baltimore.ie>, John Boyer <JBoyer@PureEdge.com>
Cc: w3c-ietf-xmldsig@w3.org
Message-ID: <58682180.1023752795@pinkpanther>
Hi,

1: I would like to use the way Merlin described in his last mail (the XPath 
expressions select nodes in the input DOCUMENT, but we can only "unionize" 
nodes which have already been in the input NODESET). If this transform type 
comes as a first transform, not many things will change. I think it's a 
good way that a transform cannot re-include nodes which have been omitted 
by previous ones. So I vote: yes (option 2)

2: I think it's an absolute must-have that multiple 
union/subtract/intersect 'steps' can be combined in a single ds:Transform. 
If we force that only one single step can be placed into a single 
ds:Transform, it would make very ugly tweaks necessary in my software if I 
want to optimize multiple steps in a single transformation. This grouping 
should be possible regardless of how the working group descides on the 
above issue. So I vote: a strong yes

3: I did not really understood the algorithm Merlin described, but if he 
sees a way how implementors can make some "tree-labeling" and only a single 
traversal over the tree, this seems very cool.


Regards,
Christian


--On Montag, 10. Juni 2002 17:16 -0400 Joseph Reagle <reagle@w3.org> wrote:

>
> I'd like to move forward to a Last Call for this specification ASAP. Last
> week Merlin outlined some options [1], one of which being a new proposal
> that he gave some more details about [2] in response to Christian.
>
> So to conclude the thread I'd like to know what people prefer:
> 1. Move forward with what we have presently [3].
> 2. Specify Merlin's proposal (The timing of this will be
> dependent on properly representing the proposal (e.g., Merlin's time
> <smile/>) and then reviewing and iterating on it a few times to make sure
> we have it clear.)
>
> Please respond before Wednesday. I'm favoring option #1 unless there's a
> couple of folks advocating/willing-to-write #2 without a strong dissent.
>
> [1] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002AprJun/0288
> [2] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002AprJun/0291
> [3] http://www.w3.org/TR/2002/WD-xmldsig-filter2-20020425/
>
> ----------  Forwarded Message  ----------
>
> Subject: Re: XPath filter 2.0
> Date: Thu, 06 Jun 2002 12:22:47 +0100
> From: merlin <merlin@baltimore.ie>
> To: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
> Cc: w3c-ietf-xmldsig@w3.org
>
> r/geuer-pollmann@nue.et-inf.uni-siegen.de/2002.06.06/10:29:55
>
>>>     * Implementations SHOULD note that an efficient realization
>>>       of this transform will not compute a node set at each
>>>       point in this transform, but instead make a list of
>>>       the operations and the XPath-selected nodes, and then
>>>       iterate through the input document once, in document
>>>       order, constructing a filtering node set N based upon
>>>       the operations and selected nodes.
>>>   .......
>>>   . Efficiency is as with the current spec. Basically this
>>>     fixes union.
>>
>> Does this mean that based on the operations, you make some tree-labeling
>> ;-)) and then you make one tree-traversal to output the selected nodes?
>
> Yes, but this is the exact algorithm that implementations
> of the _current_ XPath Filter 2.0 transform _should_ use
> if a sequence of the current XPath Filter 2.0 transforms
> precedes c14n. This formulation of the filter simply makes
> it easier to express in terms of SHOULD language.
>
>> Sounds cool, in that case, the efficiency would be much better then the
>> current results of v2.0. What kind of algorithm do you use for
>>
>> - make a list of operations and selected nodes,
>> - decide based on this data which nodes are the result.
>
> If you're asking for a normative algorithm for the general
> case, then I would have to think for a while. Restricting
> myself somewhat:
>
> First, let's characterize a sequence of filters:
>   Filter ::= (INTERSECT | SUBTRACT | UNION)+
>
> You will observe that adjacent SUBTRACT and INTERSECT
> operations can be idempotently reordered, and that
> adjacent operations of the same type can be computationally
> merged. However, I would expect that type of thing to be
> done in the XPath expressions so I will ignore this.
>
> Restricting myself to the following production, which captures
> all _common_ (in my opinion) use cases:
>   SimpleFilter ::= A:UNION* (B:INTERSECT | C:SUBTRACT)* D:UNION*
>
> Iterate over the document.
>   (define (include-node) ;; whether or not to include a node
>     (cond ;; returns the first match
>       ((encountered-any D) t)
>         ;; if you've encountered a node in a trailing union
>       ((encountered-any C) nil)
>         ;; not if you've encountered a subtraction node
>       ((null B) t)
>         ;; if there are no intersect operations
>       ((encountered-all B) t)
>         ;; if you've encountered a node in each intersect node set
>       (t nil))) ;; not otherwise
>
> Note that encountered-any returns nil if its parameter is an
> empty list of node sets, but encountered-all is true in this
> case. We can therefore express this concisely:
>   (define (include-node)
>     (or (encountered-any D)
>       (and (not (encountered-any C)) (encountered-all B))))
>
> You can implement encountered-foo and therefore include-node
> strictly in terms of node labeling, a stack and iteration.
> Obviously you must also consider the input node set.
>
> Going from this to a fully general solution is fairly
> straightforward. Observe that UNIONs are subject to ALL
> subsequent INTERSECT and SUBTRACT operations, but no preceding
> ones, and that the entire filter is equivalent to:
>   UNION/ Filter
>
> I can then say that a node is included if I have encountered
> a node in ANY ( UNION operation AND NOT ANY SUBSEQUENT SUBTRACT
> operation AND ALL SUBSEQUENT INTERSECT operations ).
>
> Work done to compute this is proportional to the number of
> filters but only done at a labeled node.
>
> Merlin
>
>> Christian
>>
>>
>>
>>
>> --On Donnerstag, 6. Juni 2002 01:09 +0100 merlin <merlin@baltimore.ie>
>>
>> wrote:
>>> Hi,
>>>
>>> Quick summary of options:
>>>
>>> 1. Current Spec
>>>   . This is intuitive (in my opinion) because it is based on a
>>>     linear sequence of set operations.
>>>   . Typical (IMHO) use cases require 2 XPath evaluations.
>>>     However, increasingly complex filtering requirements incur
>>>     increasing cost; an arbitrarily complex expression requires
>>>     an arbitrarily large number of simple XPath expressions.
>>>     However, the standard XPath filter may be more useful for
>>>     these anyway.
>>>   . Operation can, in most cases, be commingled with c14n for
>>>     efficiency, but:
>>>   . The union operator is really ugly and unintuitive.
>>>
>>> 2. Christian's Spec
>>>   . *I* do not believe this is as intuitive; it involves labeling
>>>     nodes and then traversing the document, proceeding based
>>>     on node labels (e.g., omit-but-traverse).
>>>   . Typical use cases require 2 XPath evaluations. Increasingly
>>>     complex filtering requirements can be solved in a fixed
>>>     number (2/3) of increasingly complex XPath expressions.
>>>   . Operation can be commingled with c14n for effiency.
>>>
>>> 3. Or, we can take a variant of the current spec. I won't
>>>    detail it horrendously, but basically:
>>>
>>>   . The XPath Filter 2.1 takes, as a parameter, a sequence
>>>     of operations, each of which is characterized as a
>>>     set operation (intersect, subtract, union) and an
>>>     XPath expression.
>>>   . Operation over an input node set is as follows:
>>>     * Construct a node set N consisting of all the
>>>       nodes in the input document.
>>>     * Iterate through each of the operations.
>>>       # Evaluate the XPath expression; the result is X.
>>>       # Expand all identified nodes to include their
>>>         subtrees; the result is Y.
>>>       # Assign N = N op Y
>>>     * Use the resulting node set N as a filter to select
>>>       which nodes from the input node set will remain in the
>>>       output node set, just as the XPath 1.0 filter. This is
>>>       tantamount to intersection with the input node set.
>>>     * Implementations SHOULD note that an efficient realization
>>>       of this transform will not compute a node set at each
>>>       point in this transform, but instead make a list of
>>>       the operations and the XPath-selected nodes, and then
>>>       iterate through the input document once, in document
>>>       order, constructing a filtering node set N based upon
>>>       the operations and selected nodes.
>>>     * Implementations SHOULD note that iterating through the
>>>       document and constructing a filtering node set N can
>>>       be efficiently commingled with the canonicalization
>>>       transform if canonicalization is performed immediately
>>>       after this transform.
>>>   . With this formulation, intersection and subtraction
>>>     are IDENTICAL to the existing spec, with the only
>>>     change being that you can put them in one transform
>>>     or many.
>>>   . Union is, however, much improved (in my opinion). You
>>>     can only use it to include nodes that would be
>>>     removed by a previous operation in the same transform.
>>>     As a result, the output node set will only include
>>>     nodes from the input node set.
>>>   . Efficiency is as with the current spec. Basically this
>>>     fixes union.
>>>
>>> I write this a while ago; thought I'd send it rather
>>> than delete it. It's probably wasteful to propose yet
>>> another option.
>>>
>>> Merlin
>
> -------------------------------------------------------
>
> --
>
> Joseph Reagle Jr.                 http://www.w3.org/People/Reagle/
> W3C Policy Analyst                mailto:reagle@w3.org
> IETF/W3C XML-Signature Co-Chair   http://www.w3.org/Signature/
> W3C XML Encryption Chair          http://www.w3.org/Encryption/2001/
Received on Monday, 10 June 2002 17:46:44 UTC