Re: Alternative XPathFilter - a picture says more...

Hi Christian,

I don't believe that your proposal does anything (matching the
requirements of this particular document) that tree-based set
intersect/subtract does not do, it is no faster (it is basically
the same thing for the required operations), and it introduces new
and (imho) confusing terminology (given that we're working with node
"sets.") Admittedly, it allows operations such as exclude an element in
the middle of a tree, but include its children ("H" in your example),
but I don't think that these features are compelling. People can use
the old XPath if they need an operation as esoteric as that.

I would side with John that the operations we need are including subtrees
and excluding subtrees, and that they need to be as fast as possible. I
would disagree with John that "include from whole document" and "exclude
from whole document" operations are the way to solve this. They are too
limited and are, in (my) practice, no faster than the fully general set
operational equivalents that I espouse (for the relevant cases).

I believe that John's contention is that he cannot rely on all implementors
to perform set operations with sufficient speed. I don't think that this
is a reason to choose a technically inferior solution. In fact, I think
that if we support both set operations and "whole document" operations
then we are essentially giving implementors license to do the set
operations poorly.

I think that we should define only the set operations, and should
clearly document the potential optimizations and that these operations
SHOULD be optimized by an implementation (e.g., performance SHOULD
be broadly linear in the size of the selected subtrees, not the input
document.) Implementors must then provide explicit justification for
NOT optimizing their code, so John will get a reasonable assurance
of speed and I'll get my set ops!

Apologies, John, if I misrepresent you.

Yes Joseph, I think a teleconference might be interesting! Not Friday.


>Hi Joseph,
>ho John,
>maybe a picture says more than thousand words ;-)) (I renamed the include 
>to include-but-search to indicate that it has to be traversed).
>Given the tree in the picture. I arbitrarily selected some nodes (single 
>nodes on the right side or contingous areas on the left) which are to be 
>selected by this xpath transform. Black nodes are to be included in the 
>output, white nodes are to be omitted.
>The root node A is labeled "exclude-but-search". This means that it is not 
>to be outputted, but some descendants (probably) are.
>The black node B on the left side is labeled "include-but-search" because 
>itself is to be outputted and it's descendants must be traversed to copy 
>them to the result node set.
>The unselected child C is labeled "exclude" because itself and _all_ it's 
>descendants are omitted/not outputted. So we don't need the -but-search 
>suffix because we won't find any selected nodes inside - this is an 
>The unselected subtree D is in an area where no nodes are selected, but if 
>we label it 'exclude', the algorithm won't traverse it and we can save 
>time. But without that label, it'd work, though.
>The selected-subtree E in the middle is "include-but-search" because itself 
>and all it's descendants are to be outputted.
>On the right side, we have this annoying exclude-but-search, 
>include-but-search, exclude-but-search alternation because here are no nice 
>subtrees which are the main focus of this transform but a 'real' nodeset 
>which could also be handled by our old XPath transform...
>We have 7 contingous areas or nodes and NEED 7 labels (XPath evaluations) 
>for this, while we can optimize the speed with 1 additional label (the 
>"exclude" for node D).
>Is my idea now a little bit clearer now?
>Best regards,

Baltimore Technologies plc will not be liable for direct,  special,  indirect 
or consequential  damages  arising  from  alteration of  the contents of this
message by a third party or as a result of any virus being passed on.

This footnote confirms that this email message has been swept by
Baltimore MIMEsweeper for Content Security threats, including
computer viruses.

Received on Tuesday, 19 March 2002 18:48:48 UTC