- From: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
- Date: Wed, 15 May 2002 11:00:28 +0200
- To: merlin <merlin@baltimore.ie>
- cc: reagle@w3.org, John Boyer <JBoyer@PureEdge.com>, w3c-ietf-xmldsig@w3.org
Hi Merlin, I must disagree that the proposed XPath filter is THAT intuitive: my biggest problem was to call that transform a "filter". To me, a filter is a transform which subsequently "removes" something (the filtered-out material) from the to-be-filtered material. So the result is always "smaller" than the input. The xfilter2/union operation is very strange with respect to that model [1]. People must be VERY careful that the do not construct steps which re-include material the previously filtered out and which must not be signed. A e.g. union(/) would destroy the results of all previously done transforms. The same goes for comments. I use a URI="" but the output contains comments. This does not really add clarity. But xfilter2 must have the union selection, otherwise it wouldn't be possible to select arbitrary content. About grouping xfilter2 transforms: It's a nice to have, but I've seen that there is no way to use this for optimization. (OK, a stuupid union(/) at the end would allow 'optimization', but if the steps are written good, that's not possible). My labeling proposal is NOT that complicated: 1: Clarity: I did not write a fully-flavoured spec for that idea like you did, so well, OK, it lacks clarity. Would clarity bee added if I write a full w3c-like document? I did not do till now because I've seen great resistance to even have a closer look at this babe. 2: Intuitiveness: The output result node set can only contain nodes which have also been in the input node set. That's what I understand a transform (and a filter) works like. 3: Simplicity: I directly included the processing model for the implementor in my proposal. Maybe that's too complicated for users. But the model itself is totally simple: For a given node C in the input node set, it must be checked whether he's in the output or not. Label the tree. Walk an the ancestor-or-self axis of the context node C. The first label decides: if it's exclude or excludeButSearch, omit that node; if it's include, include it. That's it. End of magic. Not simple? 4: *real-world* problems: Sort of killer argument. Personally, I'm at the university, I do research on that topic, I don't deploy business solutions -> I can't supply ANY real-world problems which you require. 5: Preformance: What costs time in my implementation is the number of XPath evaluations which have to be done. This number ranges (for my approach) between a minimum of one (1) and a maximum of three (3), while the third-one is a tradeoff between document size (time to traverse a fully-excluded subtree) and and the time required to evaluate one XPath. So REQUIRED are a maximum of two (2) XPath evaluations. 5a: The super-duper chance for optimization is in the combination with c14n: If there is c14n directly after my transform, both can be done in a single step, with the following timing: time(transform+c14n) = time(1--3 xpath evals)+time(c14n). NO additional costs. Can xfilter2 do that?. 5b: The xfilter2 transform enforces that the number of XPath evaluations is the same as the number of transforms. Regards, Christian [1] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002JanMar/0250.html --On Montag, 13. Mai 2002 20:10 +0100 merlin <merlin@baltimore.ie> wrote: > Hi Christian, > > The proposed XPath filter is clear and intuitive and it > effectively and efficiently solves the problems that it sets > out to address. I think it might be a good idea to allow > multiple XPath children of the filter transform; that would > allow a sequence of steps to be logically grouped, and might > help implementers perform optimizations. > > In my opinion, your labeling proposal lacks the clarity and > simplicity of the XPath filter and is not addressing a problem > that needs solving, based on my understanding of the needs > of current XML signature applications. > > If the purpose of this transform was to solve the types of > problem that you are posing, then I would grant that a pure > set-based solution is not as efficient as possible, and that > your proposal would merit consideration. > > However, that is not the purpose of this transform. Your > own data show that the current proposal is more efficient > than yours for *real-world* problems, which are the types of > problem that we need to solve. > > Even if the performance tipped slightly the other way, I > think that set-based operations on subtrees are a much more > intuitive and accessible solution than yours, and so I would > still advocate them. If you can, as I have asked before, > produce a real-world problem that is badly solved by our > transform, then please present it. I do not count the > example below (yes, I've looked at it) as a real-world > problem. Otherwise, I think your own data show that we have > a good solution. > > Merlin > > r/geuer-pollmann@nue.et-inf.uni-siegen.de/2002.05.13/16:04:48 >> --On Donnerstag, 9. Mai 2002 23:50 +0200 Christian Geuer-Pollmann >> <geuer-pollmann@nue.et-inf.uni-siegen.de> wrote: >> >>> When you look at the results below, you see that each step in the >>> xfilter2spec_xfilter2_(1/2/3) adds more processing time, while this is >>> not the case for 'my' transform: I can't tell why they are as they are. >> >> I constructed an obfuscated example in which I select some subtrees for >> inclusion/exclusion. To get the final result, I must perform 7 xfilter2 >> transforms. Each transform adds more overhead. To test how much each >> transform requires, I ran the test suite 10 times, while the first test >> only did the 1st transform, the second test included the 1st and the 2nd >> transform while the 7th test included all seven steps. The results are >> shown below >> >> 10 * apachesample_xfilter2_1 took 5,097 seconds >> 10 * apachesample_xfilter2_2 took 5,879 seconds >> 10 * apachesample_xfilter2_3 took 6,790 seconds >> 10 * apachesample_xfilter2_4 took 7,290 seconds >> 10 * apachesample_xfilter2_5 took 8,442 seconds >> 10 * apachesample_xfilter2_6 took 9,503 seconds >> 10 * apachesample_xfilter2_7 took 10,475 seconds >> >> You can see that the required time is strictly monotonic increasing for >> each added transform step. To check that against my own algorithm, I >> write 7 transforms which achieve the same results as the corresponding >> xfilter2 ones: >> >> 10 * apachesample_apachefilter_1 took 5,207 seconds >> 10 * apachesample_apachefilter_2 took 5,638 seconds >> 10 * apachesample_apachefilter_3 took 5,408 seconds >> 10 * apachesample_apachefilter_4 took 5,478 seconds >> 10 * apachesample_apachefilter_5 took 5,528 seconds >> 10 * apachesample_apachefilter_6 took 5,508 seconds >> 10 * apachesample_apachefilter_7 took 5,358 seconds >> >> You can see that the time is -- well, not constant, but does not show >> this heavy dependency to take longer for complexer selections. >> >> BTW, if you wanna check this yourself, the implementation of the >> transform [1] is in the Apache CVS, the test suite also [2]. >> >> Kind regards, >> Christian >> >> >> [1] >> <http://cvs.apache.org/viewcvs.cgi/xml-security/src/org/apache/xml/secur >> ity >> /transforms/implementations/TransformXPathFilterCHGP.java?rev=1.2&conten >> t-t ype=text/vnd.viewcvs-markup> >> >> [2] >> <http://cvs.apache.org/viewcvs.cgi/xml-security/src_samples/org/apache/x >> ml/ >> security/samples/TransformPerformanceTester.java?rev=1.3&content-type=te >> xt/ vnd.viewcvs-markup>
Received on Wednesday, 15 May 2002 10:48:21 UTC