XPath filter 2.0

Hi,

Quick summary of options:

1. Current Spec
  . This is intuitive (in my opinion) because it is based on a
    linear sequence of set operations.
  . Typical (IMHO) use cases require 2 XPath evaluations.
    However, increasingly complex filtering requirements incur
    increasing cost; an arbitrarily complex expression requires
    an arbitrarily large number of simple XPath expressions.
    However, the standard XPath filter may be more useful for
    these anyway.
  . Operation can, in most cases, be commingled with c14n for
    efficiency, but:
  . The union operator is really ugly and unintuitive.

2. Christian's Spec
  . *I* do not believe this is as intuitive; it involves labeling
    nodes and then traversing the document, proceeding based
    on node labels (e.g., omit-but-traverse).
  . Typical use cases require 2 XPath evaluations. Increasingly
    complex filtering requirements can be solved in a fixed
    number (2/3) of increasingly complex XPath expressions.
  . Operation can be commingled with c14n for effiency.

3. Or, we can take a variant of the current spec. I won't
   detail it horrendously, but basically:

  . The XPath Filter 2.1 takes, as a parameter, a sequence
    of operations, each of which is characterized as a
    set operation (intersect, subtract, union) and an
    XPath expression.
  . Operation over an input node set is as follows:
    * Construct a node set N consisting of all the
      nodes in the input document.
    * Iterate through each of the operations.
      # Evaluate the XPath expression; the result is X.
      # Expand all identified nodes to include their
        subtrees; the result is Y.
      # Assign N = N op Y
    * Use the resulting node set N as a filter to select
      which nodes from the input node set will remain in the
      output node set, just as the XPath 1.0 filter. This is
      tantamount to intersection with the input node set.
    * Implementations SHOULD note that an efficient realization
      of this transform will not compute a node set at each
      point in this transform, but instead make a list of
      the operations and the XPath-selected nodes, and then
      iterate through the input document once, in document
      order, constructing a filtering node set N based upon
      the operations and selected nodes.
    * Implementations SHOULD note that iterating through the
      document and constructing a filtering node set N can
      be efficiently commingled with the canonicalization
      transform if canonicalization is performed immediately
      after this transform.
  . With this formulation, intersection and subtraction
    are IDENTICAL to the existing spec, with the only
    change being that you can put them in one transform
    or many.
  . Union is, however, much improved (in my opinion). You
    can only use it to include nodes that would be
    removed by a previous operation in the same transform.
    As a result, the output node set will only include
    nodes from the input node set.
  . Efficiency is as with the current spec. Basically this
    fixes union.

I write this a while ago; thought I'd send it rather
than delete it. It's probably wasteful to propose yet
another option.

Merlin

Received on Wednesday, 5 June 2002 20:10:18 UTC