- From: John Boyer <JBoyer@PureEdge.com>
- Date: Tue, 21 May 2002 17:20:42 -0700
- To: "Christian Geuer-Pollmann" <geuer-pollmann@nue.et-inf.uni-siegen.de>, "merlin" <merlin@baltimore.ie>
- Cc: <reagle@w3.org>, <w3c-ietf-xmldsig@w3.org>
Hi Christian, I apologize that I could not get around to this sooner, though I can hardly believe the lucky set of circumstances that freed up time today. Out of respect for the amount of effort that you have put into proposing the alternative subtree-labelling scheme, it seemed only fair to have a closer look at it. In particular, I found compelling your statements about the opportunity to optimize. Firstly, I think there is an easily correctable and small error in the processing model you proposed some time ago. You proposed an 'include' variable that would track the default behavior to be used on unlabelled nodes. This needs to be more like a stack, or alternately the include variable needs to be reset at a node before visiting each of its children, not just before visiting the first child. The reason is that the default include setting for the nodes of a subtree is corrupted if one descends into a subsubtree that must use the include variable to set the default for its descendant nodes. Other than that, the summative thought that I have on your method is that most of the power to optimize seems to be if one is doing multiple transformations *under the assumption* of a processign data model that we unfortunately do not have. The problem is that the processing model allows for octet streams or node-sets to be passed from transform to transform. To take advantage of optimized tree labeling, one would have to augment the node-set with the labels. One alternative, which appears in your early proposals, is to include multiple XPath elements in the same transform. Within the transform, we can do whatever we like to augment the node-set. This is where I see your proposal deriving the most benefit, but I also believe that subtree set operations could be similarly optimized. In other words, the tree labeling concept seems to be a clever way to optimize multiple set operations performed in the same transform. Naturally, the processing details and the semantics of the labels would differ, particularly to account for the order in which the labels were affixed. For example, a union of a subtree appearing after a subtract means something different than a union before a subtract. Intersection is perhaps the most difficult, but offhand I don't see why the details cannot be properly worked out if each subtree label is augmented with an 'order' integer. This leaves us with an interesting thought. The DSig processing model states that the application must behave 'as if' node-sets are being passed from transform to transform, not that this is what must be done in fact. Thus, whether your suggested semantics are used or whether we stick to subtree set operations, it seems to me that an implementation can be created which is based solely on a labelling method, at least until a non-XPath transform is reached. So, what does your proposed method do that we cannot achieve? The answer appears to be that it is easier to work out the rules for a labelling optimization for multiple transformations because the semantics are simpler. However, with particular regard to the case of one or two xpath filter transforms, esp. if it is a subtract on the whole document, the optimization for set operations is not as complex. In turn, I think this answers the highly compelling question you raised, which is whether the existing set-based operations could be used to achieve time(transform+c14n) = time(1--3 xpath evals)+time(c14n) if there is c14n directly after the current xfilter transform. As far as I can tell, the most important transformation sequence to optimize is a subtract followed by a possible union indicating subsubtrees of the subtrees that were subtracted from a full document. In this case, the subtract nodes are labelled exclude-but-search and the union nodes are labelled 'include' (if there is no following union, the subtract nodes can simply be labelled 'exclude'). The second most important case is an intersect followed by a possible subtract. The intersect nodes are marked 'include-but-search' and the subtract nodes are marked 'exclude' (again, if there is no following subtract, then the intersect nodes can be marked as 'include'). In the first case, the default global 'include' is true and in the second case the global default is false. Regardless of the case, a c14n could utilize the labels in lieu of a node-set, with the result meeting the time bound you specified. Finally, due to the fact that we are still not getting quite the performance I had hoped for out of all implementations, I think that we should mandate some of these optimizations. Thanks, John Boyer -----Original Message----- From: Christian Geuer-Pollmann [mailto:geuer-pollmann@nue.et-inf.uni-siegen.de] Sent: Wednesday, May 15, 2002 2:00 AM To: merlin Cc: reagle@w3.org; John Boyer; w3c-ietf-xmldsig@w3.org Subject: Re: Performace data and comparison Hi Merlin, I must disagree that the proposed XPath filter is THAT intuitive: my biggest problem was to call that transform a "filter". To me, a filter is a transform which subsequently "removes" something (the filtered-out material) from the to-be-filtered material. So the result is always "smaller" than the input. The xfilter2/union operation is very strange with respect to that model [1]. People must be VERY careful that the do not construct steps which re-include material the previously filtered out and which must not be signed. A e.g. union(/) would destroy the results of all previously done transforms. The same goes for comments. I use a URI="" but the output contains comments. This does not really add clarity. But xfilter2 must have the union selection, otherwise it wouldn't be possible to select arbitrary content. About grouping xfilter2 transforms: It's a nice to have, but I've seen that there is no way to use this for optimization. (OK, a stuupid union(/) at the end would allow 'optimization', but if the steps are written good, that's not possible). My labeling proposal is NOT that complicated: 1: Clarity: I did not write a fully-flavoured spec for that idea like you did, so well, OK, it lacks clarity. Would clarity bee added if I write a full w3c-like document? I did not do till now because I've seen great resistance to even have a closer look at this babe. 2: Intuitiveness: The output result node set can only contain nodes which have also been in the input node set. That's what I understand a transform (and a filter) works like. 3: Simplicity: I directly included the processing model for the implementor in my proposal. Maybe that's too complicated for users. But the model itself is totally simple: For a given node C in the input node set, it must be checked whether he's in the output or not. Label the tree. Walk an the ancestor-or-self axis of the context node C. The first label decides: if it's exclude or excludeButSearch, omit that node; if it's include, include it. That's it. End of magic. Not simple? 4: *real-world* problems: Sort of killer argument. Personally, I'm at the university, I do research on that topic, I don't deploy business solutions -> I can't supply ANY real-world problems which you require. 5: Preformance: What costs time in my implementation is the number of XPath evaluations which have to be done. This number ranges (for my approach) between a minimum of one (1) and a maximum of three (3), while the third-one is a tradeoff between document size (time to traverse a fully-excluded subtree) and and the time required to evaluate one XPath. So REQUIRED are a maximum of two (2) XPath evaluations. 5a: The super-duper chance for optimization is in the combination with c14n: If there is c14n directly after my transform, both can be done in a single step, with the following timing: time(transform+c14n) = time(1--3 xpath evals)+time(c14n). NO additional costs. Can xfilter2 do that?. 5b: The xfilter2 transform enforces that the number of XPath evaluations is the same as the number of transforms. Regards, Christian [1] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002JanMar/0250.htm l --On Montag, 13. Mai 2002 20:10 +0100 merlin <merlin@baltimore.ie> wrote: > Hi Christian, > > The proposed XPath filter is clear and intuitive and it > effectively and efficiently solves the problems that it sets > out to address. I think it might be a good idea to allow > multiple XPath children of the filter transform; that would > allow a sequence of steps to be logically grouped, and might > help implementers perform optimizations. > > In my opinion, your labeling proposal lacks the clarity and > simplicity of the XPath filter and is not addressing a problem > that needs solving, based on my understanding of the needs > of current XML signature applications. > > If the purpose of this transform was to solve the types of > problem that you are posing, then I would grant that a pure > set-based solution is not as efficient as possible, and that > your proposal would merit consideration. > > However, that is not the purpose of this transform. Your > own data show that the current proposal is more efficient > than yours for *real-world* problems, which are the types of > problem that we need to solve. > > Even if the performance tipped slightly the other way, I > think that set-based operations on subtrees are a much more > intuitive and accessible solution than yours, and so I would > still advocate them. If you can, as I have asked before, > produce a real-world problem that is badly solved by our > transform, then please present it. I do not count the > example below (yes, I've looked at it) as a real-world > problem. Otherwise, I think your own data show that we have > a good solution. > > Merlin > > r/geuer-pollmann@nue.et-inf.uni-siegen.de/2002.05.13/16:04:48 >> --On Donnerstag, 9. Mai 2002 23:50 +0200 Christian Geuer-Pollmann >> <geuer-pollmann@nue.et-inf.uni-siegen.de> wrote: >> >>> When you look at the results below, you see that each step in the >>> xfilter2spec_xfilter2_(1/2/3) adds more processing time, while this is >>> not the case for 'my' transform: I can't tell why they are as they are. >> >> I constructed an obfuscated example in which I select some subtrees for >> inclusion/exclusion. To get the final result, I must perform 7 xfilter2 >> transforms. Each transform adds more overhead. To test how much each >> transform requires, I ran the test suite 10 times, while the first test >> only did the 1st transform, the second test included the 1st and the 2nd >> transform while the 7th test included all seven steps. The results are >> shown below >> >> 10 * apachesample_xfilter2_1 took 5,097 seconds >> 10 * apachesample_xfilter2_2 took 5,879 seconds >> 10 * apachesample_xfilter2_3 took 6,790 seconds >> 10 * apachesample_xfilter2_4 took 7,290 seconds >> 10 * apachesample_xfilter2_5 took 8,442 seconds >> 10 * apachesample_xfilter2_6 took 9,503 seconds >> 10 * apachesample_xfilter2_7 took 10,475 seconds >> >> You can see that the required time is strictly monotonic increasing for >> each added transform step. To check that against my own algorithm, I >> write 7 transforms which achieve the same results as the corresponding >> xfilter2 ones: >> >> 10 * apachesample_apachefilter_1 took 5,207 seconds >> 10 * apachesample_apachefilter_2 took 5,638 seconds >> 10 * apachesample_apachefilter_3 took 5,408 seconds >> 10 * apachesample_apachefilter_4 took 5,478 seconds >> 10 * apachesample_apachefilter_5 took 5,528 seconds >> 10 * apachesample_apachefilter_6 took 5,508 seconds >> 10 * apachesample_apachefilter_7 took 5,358 seconds >> >> You can see that the time is -- well, not constant, but does not show >> this heavy dependency to take longer for complexer selections. >> >> BTW, if you wanna check this yourself, the implementation of the >> transform [1] is in the Apache CVS, the test suite also [2]. >> >> Kind regards, >> Christian >> >> >> [1] >> <http://cvs.apache.org/viewcvs.cgi/xml-security/src/org/apache/xml/secur >> ity >> /transforms/implementations/TransformXPathFilterCHGP.java?rev=1.2&conten >> t-t ype=text/vnd.viewcvs-markup> >> >> [2] >> <http://cvs.apache.org/viewcvs.cgi/xml-security/src_samples/org/apache/x >> ml/ >> security/samples/TransformPerformanceTester.java?rev=1.3&content-type=te >> xt/ vnd.viewcvs-markup>
Received on Tuesday, 21 May 2002 20:21:29 UTC