Re: Alternative XPathFilter - a picture says more...

My 2 cents:

I do like the Merlin's proposal with intersect/subtract types of XPath. 
The current
specification is not clear and when I did implementation I made this 
mistake and realised
that I did it only after reading the long thread a week ago. I also have 
to say that
in my particular case (probably it's not only my case) the result of 
XPath expression
evaluation is actually a *set* of nodes and doing set operations is a 
natural and fast.


merlin wrote:

>Hi Christian,
>I don't believe that your proposal does anything (matching the
>requirements of this particular document) that tree-based set
>intersect/subtract does not do, it is no faster (it is basically
>the same thing for the required operations), and it introduces new
>and (imho) confusing terminology (given that we're working with node
>"sets.") Admittedly, it allows operations such as exclude an element in
>the middle of a tree, but include its children ("H" in your example),
>but I don't think that these features are compelling. People can use
>the old XPath if they need an operation as esoteric as that.
>I would side with John that the operations we need are including subtrees
>and excluding subtrees, and that they need to be as fast as possible. I
>would disagree with John that "include from whole document" and "exclude
>from whole document" operations are the way to solve this. They are too
>limited and are, in (my) practice, no faster than the fully general set
>operational equivalents that I espouse (for the relevant cases).
>I believe that John's contention is that he cannot rely on all implementors
>to perform set operations with sufficient speed. I don't think that this
>is a reason to choose a technically inferior solution. In fact, I think
>that if we support both set operations and "whole document" operations
>then we are essentially giving implementors license to do the set
>operations poorly.
>I think that we should define only the set operations, and should
>clearly document the potential optimizations and that these operations
>SHOULD be optimized by an implementation (e.g., performance SHOULD
>be broadly linear in the size of the selected subtrees, not the input
>document.) Implementors must then provide explicit justification for
>NOT optimizing their code, so John will get a reasonable assurance
>of speed and I'll get my set ops!
>Apologies, John, if I misrepresent you.
>Yes Joseph, I think a teleconference might be interesting! Not Friday.
>>Hi Joseph,
>>ho John,
>>maybe a picture says more than thousand words ;-)) (I renamed the include 
>>to include-but-search to indicate that it has to be traversed).
>>Given the tree in the picture. I arbitrarily selected some nodes (single 
>>nodes on the right side or contingous areas on the left) which are to be 
>>selected by this xpath transform. Black nodes are to be included in the 
>>output, white nodes are to be omitted.
>>The root node A is labeled "exclude-but-search". This means that it is not 
>>to be outputted, but some descendants (probably) are.
>>The black node B on the left side is labeled "include-but-search" because 
>>itself is to be outputted and it's descendants must be traversed to copy 
>>them to the result node set.
>>The unselected child C is labeled "exclude" because itself and _all_ it's 
>>descendants are omitted/not outputted. So we don't need the -but-search 
>>suffix because we won't find any selected nodes inside - this is an 
>>The unselected subtree D is in an area where no nodes are selected, but if 
>>we label it 'exclude', the algorithm won't traverse it and we can save 
>>time. But without that label, it'd work, though.
>>The selected-subtree E in the middle is "include-but-search" because itself 
>>and all it's descendants are to be outputted.
>>On the right side, we have this annoying exclude-but-search, 
>>include-but-search, exclude-but-search alternation because here are no nice 
>>subtrees which are the main focus of this transform but a 'real' nodeset 
>>which could also be handled by our old XPath transform...
>>We have 7 contingous areas or nodes and NEED 7 labels (XPath evaluations) 
>>for this, while we can optimize the speed with 1 additional label (the 
>>"exclude" for node D).
>>Is my idea now a little bit clearer now?
>>Best regards,
>Baltimore Technologies plc will not be liable for direct,  special,  indirect 
>or consequential  damages  arising  from  alteration of  the contents of this
>message by a third party or as a result of any virus being passed on.
>This footnote confirms that this email message has been swept by
>Baltimore MIMEsweeper for Content Security threats, including
>computer viruses.

Received on Tuesday, 19 March 2002 19:06:37 UTC