Re: Alternative XPathFilter - a picture says more... from Christian Geuer-Pollmann on 2002-03-20 (w3c-ietf-xmldsig@w3.org from January to March 2002)

From: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
Date: Wed, 20 Mar 2002 12:41:21 +0100
To: merlin <merlin@baltimore.ie>
Cc: w3c-ietf-xmldsig@w3.org
Message-ID: <8223174.1016628081@pinkpanther>
Hi Merlin,

can you shortly give me a hint where I can find the "tree-based set 
intersect/subtract" model? I only worked with [1] in the past. Maybe I 
missed/overread an important part of the discussion. Is your proposal in 
[2]?

General question: From what I understood, the discussed transforms in [1] 
contain only one single XPath element which can either include or exclude? 
And only one of these Transform ops per Transforms sequence? Maybe that's 
my problem. I want to make include AND exclude operations on a given node 
set (include only the body but exclude the signatures in the body).

And yes, you're right. The F/G/H sample in my illustration is very 
obfuscated and not the intend of my transform. I just included it to show 
that even such esoteric things would be possible (but should not be used).

> Admittedly, it allows operations such as exclude an element in
> the middle of a tree, but include its children ("H" in your example),
> but I don't think that these features are compelling.

But excluding ancestors and excluding children (node "B" in the example) 
which including the middle is exactly the SOAP example: I want to omit the 
outer Envelope, exclude inner signatures inside the body but include the 
body and the transaction itself.

Christian


[1] http://www.w3.org/Signature/Drafts/xmldsig-xpath
[2] 
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002JanMar/0213.html



--On Dienstag, 19. März 2002 23:48 +0000 merlin <merlin@baltimore.ie> wrote:

> Hi Christian,
>
> I don't believe that your proposal does anything (matching the
> requirements of this particular document) that tree-based set
> intersect/subtract does not do, it is no faster (it is basically
> the same thing for the required operations), and it introduces new
> and (imho) confusing terminology (given that we're working with node
> "sets.") Admittedly, it allows operations such as exclude an element in
> the middle of a tree, but include its children ("H" in your example),
> but I don't think that these features are compelling. People can use
> the old XPath if they need an operation as esoteric as that.
>
> I would side with John that the operations we need are including subtrees
> and excluding subtrees, and that they need to be as fast as possible. I
> would disagree with John that "include from whole document" and "exclude
> from whole document" operations are the way to solve this. They are too
> limited and are, in (my) practice, no faster than the fully general set
> operational equivalents that I espouse (for the relevant cases).
>
> I believe that John's contention is that he cannot rely on all
> implementors to perform set operations with sufficient speed. I don't
> think that this is a reason to choose a technically inferior solution. In
> fact, I think that if we support both set operations and "whole document"
> operations then we are essentially giving implementors license to do the
> set operations poorly.
>
> I think that we should define only the set operations, and should
> clearly document the potential optimizations and that these operations
> SHOULD be optimized by an implementation (e.g., performance SHOULD
> be broadly linear in the size of the selected subtrees, not the input
> document.) Implementors must then provide explicit justification for
> NOT optimizing their code, so John will get a reasonable assurance
> of speed and I'll get my set ops!
>
> Apologies, John, if I misrepresent you.
>
> Yes Joseph, I think a teleconference might be interesting! Not Friday.
>
> Merlin
>
> r/geuer-pollmann@nue.et-inf.uni-siegen.de/2002.03.20/00:23:40
>> Hi Joseph,
>> ho John,
>>
>> maybe a picture says more than thousand words ;-)) (I renamed the
>> include  to include-but-search to indicate that it has to be traversed).
>>
>> Given the tree in the picture. I arbitrarily selected some nodes (single
>> nodes on the right side or contingous areas on the left) which are to be
>> selected by this xpath transform. Black nodes are to be included in the
>> output, white nodes are to be omitted.
>>
>> The root node A is labeled "exclude-but-search". This means that it is
>> not  to be outputted, but some descendants (probably) are.
>>
>> The black node B on the left side is labeled "include-but-search"
>> because  itself is to be outputted and it's descendants must be
>> traversed to copy  them to the result node set.
>>
>> The unselected child C is labeled "exclude" because itself and _all_
>> it's  descendants are omitted/not outputted. So we don't need the
>> -but-search  suffix because we won't find any selected nodes inside -
>> this is an  optimization.
>>
>> The unselected subtree D is in an area where no nodes are selected, but
>> if  we label it 'exclude', the algorithm won't traverse it and we can
>> save  time. But without that label, it'd work, though.
>>
>> The selected-subtree E in the middle is "include-but-search" because
>> itself  and all it's descendants are to be outputted.
>>
>> On the right side, we have this annoying exclude-but-search,
>> include-but-search, exclude-but-search alternation because here are no
>> nice  subtrees which are the main focus of this transform but a 'real'
>> nodeset  which could also be handled by our old XPath transform...
>>
>> We have 7 contingous areas or nodes and NEED 7 labels (XPath
>> evaluations)  for this, while we can optimize the speed with 1
>> additional label (the  "exclude" for node D).
>>
>> Is my idea now a little bit clearer now?
>>
>> Best regards,
>> Christian
Received on Wednesday, 20 March 2002 06:40:08 UTC