- From: merlin <merlin@baltimore.ie>
- Date: Thu, 06 Jun 2002 12:22:47 +0100
- To: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
- Cc: w3c-ietf-xmldsig@w3.org
r/geuer-pollmann@nue.et-inf.uni-siegen.de/2002.06.06/10:29:55 >> * Implementations SHOULD note that an efficient realization >> of this transform will not compute a node set at each >> point in this transform, but instead make a list of >> the operations and the XPath-selected nodes, and then >> iterate through the input document once, in document >> order, constructing a filtering node set N based upon >> the operations and selected nodes. >> ....... >> . Efficiency is as with the current spec. Basically this >> fixes union. > >Does this mean that based on the operations, you make some tree-labeling >;-)) and then you make one tree-traversal to output the selected nodes? Yes, but this is the exact algorithm that implementations of the _current_ XPath Filter 2.0 transform _should_ use if a sequence of the current XPath Filter 2.0 transforms precedes c14n. This formulation of the filter simply makes it easier to express in terms of SHOULD language. >Sounds cool, in that case, the efficiency would be much better then the >current results of v2.0. What kind of algorithm do you use for > > - make a list of operations and selected nodes, > - decide based on this data which nodes are the result. If you're asking for a normative algorithm for the general case, then I would have to think for a while. Restricting myself somewhat: First, let's characterize a sequence of filters: Filter ::= (INTERSECT | SUBTRACT | UNION)+ You will observe that adjacent SUBTRACT and INTERSECT operations can be idempotently reordered, and that adjacent operations of the same type can be computationally merged. However, I would expect that type of thing to be done in the XPath expressions so I will ignore this. Restricting myself to the following production, which captures all _common_ (in my opinion) use cases: SimpleFilter ::= A:UNION* (B:INTERSECT | C:SUBTRACT)* D:UNION* Iterate over the document. (define (include-node) ;; whether or not to include a node (cond ;; returns the first match ((encountered-any D) t) ;; if you've encountered a node in a trailing union ((encountered-any C) nil) ;; not if you've encountered a subtraction node ((null B) t) ;; if there are no intersect operations ((encountered-all B) t) ;; if you've encountered a node in each intersect node set (t nil))) ;; not otherwise Note that encountered-any returns nil if its parameter is an empty list of node sets, but encountered-all is true in this case. We can therefore express this concisely: (define (include-node) (or (encountered-any D) (and (not (encountered-any C)) (encountered-all B)))) You can implement encountered-foo and therefore include-node strictly in terms of node labeling, a stack and iteration. Obviously you must also consider the input node set. Going from this to a fully general solution is fairly straightforward. Observe that UNIONs are subject to ALL subsequent INTERSECT and SUBTRACT operations, but no preceding ones, and that the entire filter is equivalent to: UNION/ Filter I can then say that a node is included if I have encountered a node in ANY ( UNION operation AND NOT ANY SUBSEQUENT SUBTRACT operation AND ALL SUBSEQUENT INTERSECT operations ). Work done to compute this is proportional to the number of filters but only done at a labeled node. Merlin >Christian > > > > >--On Donnerstag, 6. Juni 2002 01:09 +0100 merlin <merlin@baltimore.ie> >wrote: > >> >> >> Hi, >> >> Quick summary of options: >> >> 1. Current Spec >> . This is intuitive (in my opinion) because it is based on a >> linear sequence of set operations. >> . Typical (IMHO) use cases require 2 XPath evaluations. >> However, increasingly complex filtering requirements incur >> increasing cost; an arbitrarily complex expression requires >> an arbitrarily large number of simple XPath expressions. >> However, the standard XPath filter may be more useful for >> these anyway. >> . Operation can, in most cases, be commingled with c14n for >> efficiency, but: >> . The union operator is really ugly and unintuitive. >> >> 2. Christian's Spec >> . *I* do not believe this is as intuitive; it involves labeling >> nodes and then traversing the document, proceeding based >> on node labels (e.g., omit-but-traverse). >> . Typical use cases require 2 XPath evaluations. Increasingly >> complex filtering requirements can be solved in a fixed >> number (2/3) of increasingly complex XPath expressions. >> . Operation can be commingled with c14n for effiency. >> >> 3. Or, we can take a variant of the current spec. I won't >> detail it horrendously, but basically: >> >> . The XPath Filter 2.1 takes, as a parameter, a sequence >> of operations, each of which is characterized as a >> set operation (intersect, subtract, union) and an >> XPath expression. >> . Operation over an input node set is as follows: >> * Construct a node set N consisting of all the >> nodes in the input document. >> * Iterate through each of the operations. >> # Evaluate the XPath expression; the result is X. >> # Expand all identified nodes to include their >> subtrees; the result is Y. >> # Assign N = N op Y >> * Use the resulting node set N as a filter to select >> which nodes from the input node set will remain in the >> output node set, just as the XPath 1.0 filter. This is >> tantamount to intersection with the input node set. >> * Implementations SHOULD note that an efficient realization >> of this transform will not compute a node set at each >> point in this transform, but instead make a list of >> the operations and the XPath-selected nodes, and then >> iterate through the input document once, in document >> order, constructing a filtering node set N based upon >> the operations and selected nodes. >> * Implementations SHOULD note that iterating through the >> document and constructing a filtering node set N can >> be efficiently commingled with the canonicalization >> transform if canonicalization is performed immediately >> after this transform. >> . With this formulation, intersection and subtraction >> are IDENTICAL to the existing spec, with the only >> change being that you can put them in one transform >> or many. >> . Union is, however, much improved (in my opinion). You >> can only use it to include nodes that would be >> removed by a previous operation in the same transform. >> As a result, the output node set will only include >> nodes from the input node set. >> . Efficiency is as with the current spec. Basically this >> fixes union. >> >> I write this a while ago; thought I'd send it rather >> than delete it. It's probably wasteful to propose yet >> another option. >> >> Merlin >
Received on Thursday, 6 June 2002 07:23:22 UTC