Re: New XPath Filter Transform from merlin on 2002-03-14 (w3c-ietf-xmldsig@w3.org from January to March 2002)

From: merlin <merlin@baltimore.ie>
Date: Thu, 14 Mar 2002 21:05:10 +0000
To: "John Boyer" <JBoyer@pureedge.com>
Cc: "TAMURA Kent" <kent@trl.ibm.co.jp>, w3c-ietf-xmldsig@w3.org, reagle@w3.org
Message-Id: <20020314210510.13F624422C@yog-sothoth.ie.baltimore.com>
r/JBoyer@pureedge.com/2002.03.14/11:07:42
>Hi Kent and Merlin,
>
>Generally, I like intersection and subtraction as features.  Three
>things bother me though:
>
>1) The first is naming.  An XPath 'include' filter should include the
>nodes specified by the XPath expression.  An XPath 'exclude' filter
>should exclude the nodes specified by the XPath expression.  I don't
>agree with the assertion tha tthis is non-intuitive.  
>
>If we had more appropriate names like 'intersect' and 'subtract',
>respectively, I think the resulting markup would more clearly
>communicate what happens, particularly with nodes selected by the XPath
>but not in the input.

I personally think that include and exclude are fine: "Include/exclude
only those nodes /from the input node set/ that match X." The difference
in our interpretations is that I interpret transforms as applying to the
input node set; the xpath filter transform doesn't really consider the
input node set at all; just the input document.

>2) More importantly, I think the problem is more complicated than what
>has been alluded to so far.  In the interest of speed, the XPath
>expressions in include and exclude filters specify subtree roots, not
>individual nodes.  The input is a set of nodes, not a set of subtree
>roots, so the XPath expression would have to be evaluated, then expanded
>to include all nodes in each subtree before the set intersection or
>subtraction could be performed against the input.
>
>To me, this is less intuitive than what is in the current specification
>because we are not intersecting and subtracting the input nodes with the
>nodes selected by the XPath.  We are intersecting and subtracting the
>input nodes with a nodes-set formed by subtree expansion of nodes
>selected by the XPath.

Again, I'd disagree on the intuitiveness; I certainly didn't consider
that that either include or exclude could be expressed as they
apparently are; subtree include/exclude against the input document,
regardless of the input node set. I personally consider this anathema
to the transform model.

>3) Exclude is not currently defined as set subtraction.  It is defined
>in exactly the same way that include is defined.  So, whether using
>include or exclude, I am not able to convince myself that your version
>of set replacement will be as fast as what is currently specified.  I
>absolutely need ways of making the features currently specified to be as
>fast as possible.

If I understand this correctly - exclude is the whole document less the
identified subtrees - can I use the transform to select the ToBeSigned
element, less any Signature children? If not, exclude seems of extremely
limited use.

In terms of performance, we can specify that this transform must behave
as if the subtrees are expanded into node sets, and node set intersection
or subtraction is performed. That leaves vendors wide open to implement
things as they wish, as we do for enveloped signature. Any implementation
that cares about performance will do things right.  Certainly, if the
input is a bizarre node set formed from a bizarre transform, then the
set operations may be slow. If, however, the input is a standard dsig
node set (, #x, #xpointer(...), octet stream) then it is trivial to
optimize subtree-based set operations. Remember, the output of this
transform must be a node set - that is what xmldsig mandates - so if
an implementation is going to have performance problems with nodes sets,
doing things differently inside this transform won't help matters.

More subjectively; when I implemented the transform under the false
assumption that it performed intersection or subtraction, I threw in
optimizations for typical node sets, such that these operations, used
in the base cases (e.g., the input is a whole document or element, with
or without comments) are no slower (give or take a few non-iterative
lines of code) than if they had been implemented as they apparently
should have been. But, I'm just a limited sample set.

>So, why don't we add 'intersect' and 'subtract' to the currently
>specified filter?

I strongly feel that the current specification is wrong. I don't think
that it will be materially faster, and I do think is is non-intuitive
and that it goes against the spirit of the transform model. Adding
more options that perform similar, but subtlely different operations
will only serve to confuse matters.

However, this is just my opinion; I'm open to input from others!

Merlin

>John Boyer
>
>
>-----Original Message-----
>From: TAMURA Kent [mailto:kent@trl.ibm.co.jp]
>Sent: Tuesday, March 12, 2002 6:05 PM
>To: John Boyer; w3c-ietf-xmldsig@w3.org
>Subject: Re: New XPath Filter Transform
>
>
>
>> I read the context part, and I think it is correct; I just
>misunderstood
>> the application of the resulting subtrees. I think that include should
>> be specified as set intersection, as exclude is set subtraction. Set
>> replacement would, I think, be non-intuitive and, in my opinion, bad.
>> We can get set replacement behaviour using set intersection and an
>input
>> nodeset from URI #xpointer(/).
>
>I agree with Merlin.  intersection and subtraction seems better.
>
>-- 
>TAMURA Kent @ Tokyo Research Laboratory, IBM
>


-----------------------------------------------------------------------------
Baltimore Technologies plc will not be liable for direct,  special,  indirect 
or consequential  damages  arising  from  alteration of  the contents of this
message by a third party or as a result of any virus being passed on.

This footnote confirms that this email message has been swept by
Baltimore MIMEsweeper for Content Security threats, including
computer viruses.
   http://www.baltimore.com
Received on Thursday, 14 March 2002 16:05:16 UTC