Re: Next steps for xmldsig from Aleksey Sanin on 2002-04-11 (w3c-ietf-xmldsig@w3.org from April to June 2002)

From: Aleksey Sanin <aleksey@aleksey.com>
Date: Wed, 10 Apr 2002 21:31:03 -0700
To: merlin <merlin@baltimore.ie>
Cc: reagle@w3.org, w3c-ietf-xmldsig@w3.org
Message-ID: <3CB51187.9010609@aleksey.com>
Hi, Merlin!

I already agreed with you that "subtrees" solution seems to work faster.
You are absolutelly right that I did no optimization at all (btw, it is an
interesting question "what is it better to optimize: XPath Filter or the 
XPath
engine used in the filter?"). And in my tests excluding childs in 
"subtrees" case
takes much more time than calculating single XPath expression (the XPath 
expressions
were similar and excluding N childs took N times more than selecting the 
node).
If you still have documents and expressions you've used for tests I 
would like to try
to evaluate these expressions using the test program I wrote. It'll be 
interesting
to compare the results of XML/XPath engine I use (LibXML2) with your 
results.

As an example when you do not want to sign subtrees, consider the
following element

  <it:Item xmlns:it='http://example.org/it' Id='1231' price='12.23' >
	<it:Name>something</it:Name>
	<it:Description>Something really nice and usefull</it:Description>
	<it:Color>Blue</it:Color>
	...
  </it:Item

It's possible that the only information we really want to sign is the 
item Id and price.
In "subtrees" case we need to select <it:Item /> first and next exclude 
all its childs.
In the other case, everything is done in one step (and it's clear that 
it will be faster).

 From my point of view, it's very difficult to prove one point of view 
or another using
any examples. I have feeling that real performance will significanty 
depend on a few
different factors we don't know about (for example, XPath engine 
optimization).
In my case, from the implementation point of view, providing 
"non-subtrees"  option
will cost nothing (assuming that "subtrees" option is implemented). On 
the other
hand, this can give a simple solution to the case when "subtrees only" 
standard
requires non-intuitive, tricky, multi-steps hack.

Also to prevent "poorly-performing signatures" we can make a clear 
distinction by
using different requirement levels (for example, MUST for "subtrees" and 
MAY for
"non-subtrees") and probably put a note about possible performance drawbacks
in using "non-subtrees" filter (remember, we assumed that before using
signatures John Smith will read the standard :) ).


Aleksey Sanin


merlin wrote:

>Hi Aleksey,
>
>We could go with a solution supporting both modes of operation; I'd
>suggest, in that case, Filter=interect and Filter=intersect-subtrees.
>However, before we go there, I would like to know what is the real-
>world use case that you are trying to solve.
>
>You seem to be suggesting that selecting just an element in isolation
>is a valid use case. Consider:
>
>  <cc:CreditCard xmlns:cc='http://example.org/cc' Type='amex'>
>    <cc:Number>...</cc:Number><cc:Expires>...</cc:Expires>
>    <cc:AuthCodeToBeFilledIn>...</cc:AuthCodeToBeFilledIn>
>  </cc:CreditCard>
>
>I can see that it might be useful to sign this credit card, less the
>AuthCodeToBeFilledIn element. I cannot see the use case for signing
>something like <cc:CreditCard />, which is the case that you have
>repeatedly brought up.
>
>I don't have much freedom to play with this (hence the current time),
>so you will have to forgive my arbitrary statistics, but I quickly
>ran a benchmark of using an intersect followed by a subtract filter
>to select an element, less certain children (e.g., the above), versus
>a pure XPath node-set intersect filter where I used a single XPath
>expression to achieve this same effect (using a predicate). The pair of
>subtree filters was, again, significantly (2x) faster than the node-set
>XPath option. Now, it would seem to me that the only real advantage
>of a node-set XPath option would be to perform this type of operation.
>Indeed, the very act of making it available would suggest that this is
>one of its purposes; but it is not the ideal choice for doing this. If,
>alternatively, the purpose of the node-set XPath is that people will be
>'familiar' with it, then their familiarity will breed poorly-performing
>signatures.
>
>So, as far as I can see (and it seems that this is confirmed by your
>tests), our subtree formulation speeds up all the meaningful use cases;
>and I'd imagine that you didn't even optimize your code for common subtree
>operations. If you can present real use cases where subtrees don't do
>the job, then I'll gladly advocate a more flexible/complex approach;
>I simply have yet to encounter the problem that you are solving.
>
>Merlin
>
>r/aleksey@aleksey.com/2002.04.10/15:24:23
>
>>There were no replies to my  suggestion [1] about an option to process
>>nodes instead of subtrees in XPath Filter 2.0 Transform so I am rasing this
>>question again. I understand the performance concerns (and I do see small
>>performance decrease in my tests).However,  I believe this will add 
>>flexibility
>>and provide better way to select "complicated" parts of XML document.
>>
>>Aleksey Sanin.
>>
>>
>>[1] 
>>http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002AprJun/0040.html
>>
>>Joseph Reagle wrote:
>>
>>>3. XPath Filter 2.0 Transform
>>>
>>>Thanks to this week's edits by Merlin (and discussion between Merlin, John, 
>>>Aleksey, and Christian) I think we've converged on a design and text that 
>>>we're happy with. If no one objects to the present text [2] (speak now!) 
>>>I'll stage it for publication as a W3C Last Call Working Draft (and first 
>>>"official" W3C WD) at the end of this month.
>>>
>>
>
>
>-----------------------------------------------------------------------------
>The information contained in this message is confidential and is intended
>for the addressee(s) only.  If you have received this message in error or
>there are any problems please notify the originator immediately.  The 
>unauthorised use, disclosure, copying or alteration of this message is 
>strictly forbidden. Baltimore Technologies plc will not be liable for
>direct, special, indirect or consequential damages arising from alteration
>of the contents of this message by a third party or as a result of any 
>virus being passed on.
>
>This footnote confirms that this email message has been swept for Content
>Security threats, including computer viruses.
>http://www.baltimore.com
>
Received on Thursday, 11 April 2002 00:31:58 UTC