Re: Alternative XPathFilter - a picture says more...

Hi Merlin,

can you shortly give me a hint where I can find the "tree-based set 
intersect/subtract" model? I only worked with [1] in the past. Maybe I 
missed/overread an important part of the discussion. Is your proposal in 

General question: From what I understood, the discussed transforms in [1] 
contain only one single XPath element which can either include or exclude? 
And only one of these Transform ops per Transforms sequence? Maybe that's 
my problem. I want to make include AND exclude operations on a given node 
set (include only the body but exclude the signatures in the body).

And yes, you're right. The F/G/H sample in my illustration is very 
obfuscated and not the intend of my transform. I just included it to show 
that even such esoteric things would be possible (but should not be used).

> Admittedly, it allows operations such as exclude an element in
> the middle of a tree, but include its children ("H" in your example),
> but I don't think that these features are compelling.

But excluding ancestors and excluding children (node "B" in the example) 
which including the middle is exactly the SOAP example: I want to omit the 
outer Envelope, exclude inner signatures inside the body but include the 
body and the transaction itself.



--On Dienstag, 19. März 2002 23:48 +0000 merlin <> wrote:

> Hi Christian,
> I don't believe that your proposal does anything (matching the
> requirements of this particular document) that tree-based set
> intersect/subtract does not do, it is no faster (it is basically
> the same thing for the required operations), and it introduces new
> and (imho) confusing terminology (given that we're working with node
> "sets.") Admittedly, it allows operations such as exclude an element in
> the middle of a tree, but include its children ("H" in your example),
> but I don't think that these features are compelling. People can use
> the old XPath if they need an operation as esoteric as that.
> I would side with John that the operations we need are including subtrees
> and excluding subtrees, and that they need to be as fast as possible. I
> would disagree with John that "include from whole document" and "exclude
> from whole document" operations are the way to solve this. They are too
> limited and are, in (my) practice, no faster than the fully general set
> operational equivalents that I espouse (for the relevant cases).
> I believe that John's contention is that he cannot rely on all
> implementors to perform set operations with sufficient speed. I don't
> think that this is a reason to choose a technically inferior solution. In
> fact, I think that if we support both set operations and "whole document"
> operations then we are essentially giving implementors license to do the
> set operations poorly.
> I think that we should define only the set operations, and should
> clearly document the potential optimizations and that these operations
> SHOULD be optimized by an implementation (e.g., performance SHOULD
> be broadly linear in the size of the selected subtrees, not the input
> document.) Implementors must then provide explicit justification for
> NOT optimizing their code, so John will get a reasonable assurance
> of speed and I'll get my set ops!
> Apologies, John, if I misrepresent you.
> Yes Joseph, I think a teleconference might be interesting! Not Friday.
> Merlin
> r/
>> Hi Joseph,
>> ho John,
>> maybe a picture says more than thousand words ;-)) (I renamed the
>> include  to include-but-search to indicate that it has to be traversed).
>> Given the tree in the picture. I arbitrarily selected some nodes (single
>> nodes on the right side or contingous areas on the left) which are to be
>> selected by this xpath transform. Black nodes are to be included in the
>> output, white nodes are to be omitted.
>> The root node A is labeled "exclude-but-search". This means that it is
>> not  to be outputted, but some descendants (probably) are.
>> The black node B on the left side is labeled "include-but-search"
>> because  itself is to be outputted and it's descendants must be
>> traversed to copy  them to the result node set.
>> The unselected child C is labeled "exclude" because itself and _all_
>> it's  descendants are omitted/not outputted. So we don't need the
>> -but-search  suffix because we won't find any selected nodes inside -
>> this is an  optimization.
>> The unselected subtree D is in an area where no nodes are selected, but
>> if  we label it 'exclude', the algorithm won't traverse it and we can
>> save  time. But without that label, it'd work, though.
>> The selected-subtree E in the middle is "include-but-search" because
>> itself  and all it's descendants are to be outputted.
>> On the right side, we have this annoying exclude-but-search,
>> include-but-search, exclude-but-search alternation because here are no
>> nice  subtrees which are the main focus of this transform but a 'real'
>> nodeset  which could also be handled by our old XPath transform...
>> We have 7 contingous areas or nodes and NEED 7 labels (XPath
>> evaluations)  for this, while we can optimize the speed with 1
>> additional label (the  "exclude" for node D).
>> Is my idea now a little bit clearer now?
>> Best regards,
>> Christian

Received on Wednesday, 20 March 2002 06:40:08 UTC