Re: Performace data and comparison from Christian Geuer-Pollmann on 2002-05-15 (w3c-ietf-xmldsig@w3.org from April to June 2002)

From: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
Date: Wed, 15 May 2002 11:00:28 +0200
To: merlin <merlin@baltimore.ie>
cc: reagle@w3.org, John Boyer <JBoyer@PureEdge.com>, w3c-ietf-xmldsig@w3.org
Message-ID: <10286060.1021460428@crypto>
Hi Merlin,

I must disagree that the proposed XPath filter is THAT intuitive: my 
biggest problem was to call that transform a "filter". To me, a filter is a 
transform which subsequently "removes" something (the filtered-out 
material) from the to-be-filtered material. So the result is always 
"smaller" than the input. The xfilter2/union operation is very strange with 
respect to that model [1]. People must be VERY careful that the do not 
construct steps which re-include material the previously filtered out and 
which must not be signed. A e.g. union(/) would destroy the results of all 
previously done transforms. The same goes for comments. I use a URI="" but 
the output contains comments. This does not really add clarity.

But xfilter2 must have the union selection, otherwise it wouldn't be 
possible to select arbitrary content.

About grouping xfilter2 transforms: It's a nice to have, but I've seen that 
there is no way to use this for optimization. (OK, a stuupid union(/) at 
the end would allow 'optimization', but if the steps are written good, 
that's not possible).

My labeling proposal is NOT that complicated:

1: Clarity: I did not write a fully-flavoured spec for that idea like you 
did, so well, OK, it lacks clarity. Would clarity bee added if I write a 
full w3c-like document? I did not do till now because I've seen great 
resistance to even have a closer look at this babe.

2: Intuitiveness: The output result node set can only contain nodes which 
have also been in the input node set. That's what I understand a transform 
(and a filter) works like.

3: Simplicity: I directly included the processing model for the implementor 
in my proposal. Maybe that's too complicated for users. But the model 
itself is totally simple: For a given node C in the input node set, it must 
be checked whether he's in the output or not. Label the tree. Walk an the 
ancestor-or-self axis of the context node C. The first label decides: if 
it's exclude or excludeButSearch, omit that node; if it's include, include 
it. That's it. End of magic. Not simple?

4: *real-world* problems: Sort of killer argument. Personally, I'm at the 
university, I do research on that topic, I don't deploy business solutions 
-> I can't supply ANY real-world problems which you require.

5: Preformance: What costs time in my implementation is the number of XPath 
evaluations which have to be done. This number ranges (for my approach) 
between a minimum of one (1) and a maximum of three (3), while the 
third-one is a tradeoff between document size (time to traverse a 
fully-excluded subtree) and and the time required to evaluate one XPath. So 
REQUIRED are a maximum of two (2) XPath evaluations.

5a: The super-duper chance for optimization is in the combination with 
c14n: If there is c14n directly after my transform, both can be done in a 
single step, with the following timing: time(transform+c14n) = time(1--3 
xpath evals)+time(c14n). NO additional costs. Can xfilter2 do that?.

5b: The xfilter2 transform enforces that the number of XPath evaluations is 
the same as the number of transforms.

Regards,
Christian

[1] 
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002JanMar/0250.html

--On Montag, 13. Mai 2002 20:10 +0100 merlin <merlin@baltimore.ie> wrote:

> Hi Christian,
>
> The proposed XPath filter is clear and intuitive and it
> effectively and efficiently solves the problems that it sets
> out to address. I think it might be a good idea to allow
> multiple XPath children of the filter transform; that would
> allow a sequence of steps to be logically grouped, and might
> help implementers perform optimizations.
>
> In my opinion, your labeling proposal lacks the clarity and
> simplicity of the XPath filter and is not addressing a problem
> that needs solving, based on my understanding of the needs
> of current XML signature applications.
>
> If the purpose of this transform was to solve the types of
> problem that you are posing, then I would grant that a pure
> set-based solution is not as efficient as possible, and that
> your proposal would merit consideration.
>
> However, that is not the purpose of this transform. Your
> own data show that the current proposal is more efficient
> than yours for *real-world* problems, which are the types of
> problem that we need to solve.
>
> Even if the performance tipped slightly the other way, I
> think that set-based operations on subtrees are a much more
> intuitive and accessible solution than yours, and so I would
> still advocate them. If you can, as I have asked before,
> produce a real-world problem that is badly solved by our
> transform, then please present it. I do not count the
> example below (yes, I've looked at it) as a real-world
> problem. Otherwise, I think your own data show that we have
> a good solution.
>
> Merlin
>
> r/geuer-pollmann@nue.et-inf.uni-siegen.de/2002.05.13/16:04:48
>> --On Donnerstag, 9. Mai 2002 23:50 +0200 Christian Geuer-Pollmann
>> <geuer-pollmann@nue.et-inf.uni-siegen.de> wrote:
>>
>>> When you look at the results below, you see that each step in the
>>> xfilter2spec_xfilter2_(1/2/3) adds more processing time, while this is
>>> not the case for 'my' transform: I can't tell why they are as they are.
>>
>> I constructed an obfuscated example in which I select some subtrees for
>> inclusion/exclusion. To get the final result, I must perform 7 xfilter2
>> transforms. Each transform adds more overhead. To test how much each
>> transform requires, I ran the test suite 10 times, while the first test
>> only did the 1st transform, the second test included the 1st and the 2nd
>> transform while the 7th test included all seven steps. The results are
>> shown below
>>
>> 10 * apachesample_xfilter2_1     took  5,097 seconds
>> 10 * apachesample_xfilter2_2     took  5,879 seconds
>> 10 * apachesample_xfilter2_3     took  6,790 seconds
>> 10 * apachesample_xfilter2_4     took  7,290 seconds
>> 10 * apachesample_xfilter2_5     took  8,442 seconds
>> 10 * apachesample_xfilter2_6     took  9,503 seconds
>> 10 * apachesample_xfilter2_7     took 10,475 seconds
>>
>> You can see that the required time is strictly monotonic increasing for
>> each added transform step. To check that against my own algorithm, I
>> write  7 transforms which achieve the same results as the corresponding
>> xfilter2  ones:
>>
>> 10 * apachesample_apachefilter_1 took  5,207 seconds
>> 10 * apachesample_apachefilter_2 took  5,638 seconds
>> 10 * apachesample_apachefilter_3 took  5,408 seconds
>> 10 * apachesample_apachefilter_4 took  5,478 seconds
>> 10 * apachesample_apachefilter_5 took  5,528 seconds
>> 10 * apachesample_apachefilter_6 took  5,508 seconds
>> 10 * apachesample_apachefilter_7 took  5,358 seconds
>>
>> You can  see that the time is -- well, not constant, but does not show
>> this  heavy dependency to take longer for complexer selections.
>>
>> BTW, if you wanna check this yourself, the implementation of the
>> transform  [1] is in the Apache CVS, the test suite also [2].
>>
>> Kind regards,
>> Christian
>>
>>
>> [1]
>> <http://cvs.apache.org/viewcvs.cgi/xml-security/src/org/apache/xml/secur
>> ity
>> /transforms/implementations/TransformXPathFilterCHGP.java?rev=1.2&conten
>> t-t ype=text/vnd.viewcvs-markup>
>>
>> [2]
>> <http://cvs.apache.org/viewcvs.cgi/xml-security/src_samples/org/apache/x
>> ml/
>> security/samples/TransformPerformanceTester.java?rev=1.3&content-type=te
>> xt/ vnd.viewcvs-markup>
Received on Wednesday, 15 May 2002 10:48:21 UTC