- From: Joseph Reagle <reagle@w3.org>
- Date: Mon, 10 Jun 2002 17:16:48 -0400
- To: merlin <merlin@baltimore.ie>, Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>, "John Boyer" <JBoyer@PureEdge.com>
- Cc: <w3c-ietf-xmldsig@w3.org>
I'd like to move forward to a Last Call for this specification ASAP. Last
week Merlin outlined some options [1], one of which being a new proposal
that he gave some more details about [2] in response to Christian.
So to conclude the thread I'd like to know what people prefer:
1. Move forward with what we have presently [3].
2. Specify Merlin's proposal (The timing of this will be
dependent on properly representing the proposal (e.g., Merlin's time
<smile/>) and then reviewing and iterating on it a few times to make sure
we have it clear.)
Please respond before Wednesday. I'm favoring option #1 unless there's a
couple of folks advocating/willing-to-write #2 without a strong dissent.
[1] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002AprJun/0288
[2] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002AprJun/0291
[3] http://www.w3.org/TR/2002/WD-xmldsig-filter2-20020425/
---------- Forwarded Message ----------
Subject: Re: XPath filter 2.0
Date: Thu, 06 Jun 2002 12:22:47 +0100
From: merlin <merlin@baltimore.ie>
To: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
Cc: w3c-ietf-xmldsig@w3.org
r/geuer-pollmann@nue.et-inf.uni-siegen.de/2002.06.06/10:29:55
>> * Implementations SHOULD note that an efficient realization
>> of this transform will not compute a node set at each
>> point in this transform, but instead make a list of
>> the operations and the XPath-selected nodes, and then
>> iterate through the input document once, in document
>> order, constructing a filtering node set N based upon
>> the operations and selected nodes.
>> .......
>> . Efficiency is as with the current spec. Basically this
>> fixes union.
>
>Does this mean that based on the operations, you make some tree-labeling
>;-)) and then you make one tree-traversal to output the selected nodes?
Yes, but this is the exact algorithm that implementations
of the _current_ XPath Filter 2.0 transform _should_ use
if a sequence of the current XPath Filter 2.0 transforms
precedes c14n. This formulation of the filter simply makes
it easier to express in terms of SHOULD language.
>Sounds cool, in that case, the efficiency would be much better then the
>current results of v2.0. What kind of algorithm do you use for
>
> - make a list of operations and selected nodes,
> - decide based on this data which nodes are the result.
If you're asking for a normative algorithm for the general
case, then I would have to think for a while. Restricting
myself somewhat:
First, let's characterize a sequence of filters:
Filter ::= (INTERSECT | SUBTRACT | UNION)+
You will observe that adjacent SUBTRACT and INTERSECT
operations can be idempotently reordered, and that
adjacent operations of the same type can be computationally
merged. However, I would expect that type of thing to be
done in the XPath expressions so I will ignore this.
Restricting myself to the following production, which captures
all _common_ (in my opinion) use cases:
SimpleFilter ::= A:UNION* (B:INTERSECT | C:SUBTRACT)* D:UNION*
Iterate over the document.
(define (include-node) ;; whether or not to include a node
(cond ;; returns the first match
((encountered-any D) t)
;; if you've encountered a node in a trailing union
((encountered-any C) nil)
;; not if you've encountered a subtraction node
((null B) t)
;; if there are no intersect operations
((encountered-all B) t)
;; if you've encountered a node in each intersect node set
(t nil))) ;; not otherwise
Note that encountered-any returns nil if its parameter is an
empty list of node sets, but encountered-all is true in this
case. We can therefore express this concisely:
(define (include-node)
(or (encountered-any D)
(and (not (encountered-any C)) (encountered-all B))))
You can implement encountered-foo and therefore include-node
strictly in terms of node labeling, a stack and iteration.
Obviously you must also consider the input node set.
Going from this to a fully general solution is fairly
straightforward. Observe that UNIONs are subject to ALL
subsequent INTERSECT and SUBTRACT operations, but no preceding
ones, and that the entire filter is equivalent to:
UNION/ Filter
I can then say that a node is included if I have encountered
a node in ANY ( UNION operation AND NOT ANY SUBSEQUENT SUBTRACT
operation AND ALL SUBSEQUENT INTERSECT operations ).
Work done to compute this is proportional to the number of
filters but only done at a labeled node.
Merlin
>Christian
>
>
>
>
>--On Donnerstag, 6. Juni 2002 01:09 +0100 merlin <merlin@baltimore.ie>
>
>wrote:
>> Hi,
>>
>> Quick summary of options:
>>
>> 1. Current Spec
>> . This is intuitive (in my opinion) because it is based on a
>> linear sequence of set operations.
>> . Typical (IMHO) use cases require 2 XPath evaluations.
>> However, increasingly complex filtering requirements incur
>> increasing cost; an arbitrarily complex expression requires
>> an arbitrarily large number of simple XPath expressions.
>> However, the standard XPath filter may be more useful for
>> these anyway.
>> . Operation can, in most cases, be commingled with c14n for
>> efficiency, but:
>> . The union operator is really ugly and unintuitive.
>>
>> 2. Christian's Spec
>> . *I* do not believe this is as intuitive; it involves labeling
>> nodes and then traversing the document, proceeding based
>> on node labels (e.g., omit-but-traverse).
>> . Typical use cases require 2 XPath evaluations. Increasingly
>> complex filtering requirements can be solved in a fixed
>> number (2/3) of increasingly complex XPath expressions.
>> . Operation can be commingled with c14n for effiency.
>>
>> 3. Or, we can take a variant of the current spec. I won't
>> detail it horrendously, but basically:
>>
>> . The XPath Filter 2.1 takes, as a parameter, a sequence
>> of operations, each of which is characterized as a
>> set operation (intersect, subtract, union) and an
>> XPath expression.
>> . Operation over an input node set is as follows:
>> * Construct a node set N consisting of all the
>> nodes in the input document.
>> * Iterate through each of the operations.
>> # Evaluate the XPath expression; the result is X.
>> # Expand all identified nodes to include their
>> subtrees; the result is Y.
>> # Assign N = N op Y
>> * Use the resulting node set N as a filter to select
>> which nodes from the input node set will remain in the
>> output node set, just as the XPath 1.0 filter. This is
>> tantamount to intersection with the input node set.
>> * Implementations SHOULD note that an efficient realization
>> of this transform will not compute a node set at each
>> point in this transform, but instead make a list of
>> the operations and the XPath-selected nodes, and then
>> iterate through the input document once, in document
>> order, constructing a filtering node set N based upon
>> the operations and selected nodes.
>> * Implementations SHOULD note that iterating through the
>> document and constructing a filtering node set N can
>> be efficiently commingled with the canonicalization
>> transform if canonicalization is performed immediately
>> after this transform.
>> . With this formulation, intersection and subtraction
>> are IDENTICAL to the existing spec, with the only
>> change being that you can put them in one transform
>> or many.
>> . Union is, however, much improved (in my opinion). You
>> can only use it to include nodes that would be
>> removed by a previous operation in the same transform.
>> As a result, the output node set will only include
>> nodes from the input node set.
>> . Efficiency is as with the current spec. Basically this
>> fixes union.
>>
>> I write this a while ago; thought I'd send it rather
>> than delete it. It's probably wasteful to propose yet
>> another option.
>>
>> Merlin
-------------------------------------------------------
--
Joseph Reagle Jr. http://www.w3.org/People/Reagle/
W3C Policy Analyst mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair http://www.w3.org/Signature/
W3C XML Encryption Chair http://www.w3.org/Encryption/2001/
Received on Monday, 10 June 2002 17:16:50 UTC