RE: New XPath Filter Transform from Christian Geuer-Pollmann on 2002-03-18 (w3c-ietf-xmldsig@w3.org from January to March 2002)

From: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
Date: Mon, 18 Mar 2002 19:53:29 +0100
To: John Boyer <JBoyer@PureEdge.com>, Gregor Karlinger <gregor.karlinger@iaik.at>, merlin <merlin@baltimore.ie>
Cc: TAMURA Kent <kent@trl.ibm.co.jp>, w3c-ietf-xmldsig@w3.org, reagle@w3.org
Message-ID: <2868927389.1016481209@clouseau>
Hi John,

sorry that I used that short and inexact description of the processing model. Give me a second chance:

- The three different labels are 'attached' to the tree,
  e.g. by using a hash table with the Node as the key
  and the label value as the value.

- A Node can have a maximum of one label: It wouldn't
  make sense to attach a "include" and an "exclude"
  label to a node.

- You create an empty xpath node set (the result which is
  passed to the next transform).

- You create a boolean variable "include" which stores
  the default behaviour for non-labeled nodes.

- Now you make a tree-traverse on the full document according
  to the following procedure:

  1.) If your traversal approaches a "dont-include-
      any-descendant-or-self" label, you do not add this
      this node to the result node set and do not descend
      into this nodes childs.

  2.) If you hit an "include-descendant-or-self-and-search-forward"
      label, you set include=true, add the current node to the result
      xpath node set and visit the children of the current node.

  3.) If you hit a "dont-include-self-but-search-forward" label,
      you set include=false, do not add the current node to the
      result node set _but_ you must visit it's children; maybe
      you find one of the other labels.

  4.) You hit a node which has no label: You look inside the include variable
      whether you must include the node in the result node set or not.
      After including it or not, you visit the children.

- That's it. Now you have an XPath node set which is passed to c14n or whatever.

You do not have to evaluate an XPath against each node, you only have to check eventually set labels. You can include/exclude complete subtrees.

Don't nail me down on these long attribute names ;-)) They are just for illustration.

Let's give an example. The enveloped signature transform includes the document root and excludes the signature: A reformulation using my transform would look like this:

<Transform URI="http://www.w3.org/2002/03/xmldsig-the-next-alternative-xpath"
    ds="&dsig;">

<!-- We output the complete document -->
<XPath Filter="include-descendant-or-self-and-search-forward">
   /
</XPath>

<!-- don't output the signature and don't descend -
     you won't find subtrees to be outputted -->
<XPath Filter="dont-include-any-descendant-or-self">
  here()/ancestor-or-self::ds:Signature[1]
</XPath>
</Transform>

Another example could look like this: Given the document A, you want to write a Transform which results in document B (just elements, whitespace only added for readability, no Text nodes):

Document A: <A><B><C><D><E><F><G><H/></G></F></E></D></C></B></A>
Document B:    <B>   <D>   <F>           </F>    </D>    </B>

dont-include-self-but-search-forward
/  //C  //E

include-descendant-or-self-and-search-forward:
//B  //D  //F

dont-include-any-descendant-or-self
//G

You could also put the //G expression into the dont-include-but-search, but 
that would be waste of time...

I hope this description was a little bit clearer?

Christian












--On Montag, 18. März 2002 10:24 -0800 John Boyer <JBoyer@PureEdge.com> wrote:

> Hi Christian,
>
> My first read of this is that without a reissue of the XML DSig REC, we
> wouldn't be able to do it this way.
>
> Under this formulation, we leave the concept of a node-set travelling
> from transform to transform and instead have a 'labelled' node-set (or
> perhaps more desirably, a labelled parse tree).
>
> Also, at first glance I don't understand why set subtraction wouldn't be
> the better method.
>
> How many labels can a node have?  How do we know when to change the
> labelled node-set into something c14n can understand?
>
> Also I don't understand what the 'dont-include-self-but-search-forward'
> label does. Perhaps for each label you could describe exactly what it
> does rather than relying on the label name (just in case I'm missing
> something in each).
>
> Finally, note that we are currently discussing how to optimize what, in
> my opinion, are simpler cases involving only one transform. The method
> you are describing seems that it might only be better at optimizing
> sequences of multiple transforms.  Could you demonstrate how it could
> accommodate the include/exclude that are currently defined?
>
> Thanks,
> John Boyer
>
>
>
> -----Original Message-----
> From: Christian Geuer-Pollmann
> [mailto:geuer-pollmann@nue.et-inf.uni-siegen.de]
> Sent: Sunday, March 17, 2002 11:27 AM
> To: John Boyer; Gregor Karlinger; merlin
> Cc: TAMURA Kent; w3c-ietf-xmldsig@w3.org; reagle@w3.org
> Subject: RE: New XPath Filter Transform
>
>
> Hi John,
>
> from what I understood from your mails is that you need a transform which
> allows to easily iterate (tree-traversal) over a structure like a DOM
> tree  without evaluating an costly XPath expression against every node in
> the  XPath node set.
>
> First of all, if the document [1] should become stable in it's current
> processing model, I would add a comment that this transform only makes
> sense as a first Transform after dereferencing a nodeset or after
> transforms which produced an octet stream which is reparsed. Why? Because
> if I e.g. have an XMLDSIG 1.0 XPath transform prior the new XPath filter,
> the results of the original transform are discarded.
>
> But now to my approach: For me, it's not that clear how to use a sequence
> chain of multiple http://www.w3.org/2002/02/xmldsig-xpath-filter
> transforms  to select the correct subset. But anyhow, I recently solved a
> similar  problem and I thing that my solution would work in an
> alternative form of  your filter.
>
> Let's discuss the example in terms of DOM, even if the input is an XPath
> node set.:
>
> We 'label' the tree with three different labels:
>
> 1.) The "dont-include-any-descendant-or-self" label
> 2.) The "include-descendant-or-self-and-search-forward" label
> 3.) The "dont-include-self-but-search-forward" label
>
> Let's take Gregor's example to show how this transform works:
>
> The input document is:
>
> <root>
>   <child1>
>     <grandChild1/>
>     <grandChild2/>
>   </child1>
>   <child2>
>     <grandChild1/>
>     <grandChild2/>
>   </child2>
> </root>
>
> And I want to get as output of the transform
>
> <child1>
>     <grandChild1/>
>
>   </child1>
>
> Which means all descendant-or-selfs of child1 but not it's grandchild2:
>
> <Transform
> URI="http://www.w3.org/2002/03/xmldsig-the-next-alternative-xpath">
>
>  <!-- don't output / but descend -
>       you maybe find subtrees to be outputted -->
>  <XPath Filter="dont-include-self-but-search-forward">
>   /
>  </XPath>
>
>  <!-- We output the subtree under child1 but maybe
>       something inside this subtree is omitted -->
>  <XPath Filter="include-descendant-or-self-and-search-forward">
>   /root/child1[1]
>  </XPath>
>
>  <!-- don't output grandchild2 and don't descend -
>       you won't find subtrees to be outputted -->
>  <XPath Filter="dont-include-any-descendant-or-self">
>   /root/child1[1]/grandchild2
>  </XPath>
> </Transform>
>
> Does this makes sense to you? You can do the same optimizations: You only
> evaluate 3 XPath expressions, you make a simple tree traversal and you
> don't descend into subtrees which only contain unselected information.
>
> Regards,
> Christian
>
> [1] http://www.w3.org/Signature/Drafts/xmldsig-xpath
>






Mit freundlichen Grüßen,

Christian Geuer-Pollmann


--------------------------------------------------------------------------
Institute for Data Communications Systems             University of Siegen
Hoelderlinstrasse 3                 D-57068 Siegen                 Germany

mail:  mailto:geuer-pollmann@nue.et-inf.uni-siegen.de
web:  <http://www.nue.et-inf.uni-siegen.de/~geuer-pollmann/>
Received on Monday, 18 March 2002 13:58:39 UTC