- From: Meiko Jensen <Meiko.Jensen@ruhr-uni-bochum.de>
- Date: 11 Aug 2010 14:51:39 +0200
- To: "Pratik Datta" <pratik.datta@oracle.com>
- Cc: "Scott Cantor" <cantor.2@osu.edu>, public-xmlsec@w3.org
Pratik, Scott, I'm still in favor of a one-pass solution, for the following reason: Depending on the domain of application, one-pass is way more memory-efficient. As Pratik pointed out, the difference between an XML document's text representation and DOM representation already gives an enormous memory optimization potential, but in the one-pass scenario you can even process the XML document's content concurrently (using an event pipeline or mediator technique) and thus do not even have to store the full text representation either. For instance, SOAP messages can be read, signature-verified and application-processed in a single streaming parsing attempt (see our publications), thus you wouldn't have to store the message in full at any time. If you default in having a second pass, you'd always need to store the full message in memory, hence loosing that optimization potential. I'm not talking about a factor 5 or factor 10 here, but about O(1) vs. O(n) memory usage in best case. This is why I'm strongly recommending to think of one-pass solutions. However, Pratik is right on that one-pass is not always achievable. For instance, on signature application you have to insert the hash values, Signature element etc. *after* processing the XML document, hence you'll need some kind of second pass, or at least partial caching techniques (see our SWS07 paper). Nevertheless, signature verification can be done in one-pass, as long as no backward references are used. Hence I'd suggest trying to support this case as good as possible. Minor note: when you can do two-pass, then you can also resolve backward references in XPath. Not all cases, but a single "parent" axis is definitely achievable. In some cases, the parent axis can even be followed in one-pass, as long as the remaining XPath does not point to a previous element afterwards. For instance, following "parent::soap:Header/parent::soap:Envelope/soap:Body" is resolvable, since the soap:Body element always follows the soap:Header, hence will be processed later-on in the streaming parser approach. This might get interesting if we want to support relative XPaths. In short, two-pass can make things more easy for implementers, but gives up a lot of optimization potential. best regards Meiko Pratik Datta schrieb: > Yes, my definition of streaming has always been 2-pass. > > When you load up an XML into a DOM it explodes in size at least 5x times maybe even 20x times. This memory increase limits scalability, and it also decreases performance, because DOM results in a lot of little objects and that makes the garbage collector kick in more often. > > Streaming, whether 1-pass or 2-pass, solves this problem. With the 2 pass approach you don't have to worry about forward references. The XPath should still need to be evaluatable in a 1 pass, but the <Signature> can be analyzed or updated in a 2nd pass. > > Pratik > > -----Original Message----- > From: Scott Cantor [mailto:cantor.2@osu.edu] > Sent: Tuesday, August 10, 2010 10:03 AM > To: public-xmlsec@w3.org > Subject: 1 pass vs 2 pass > > I didn't get a clear sense of what the WG consensus is on this, but I raised > this question on the call at the end because it seems like it's pretty > critical in order to evaluate the proposals on XPath. > > I think I heard Pratik indicate his working definition for streaming is > 2-pass, and I think I understood Meiko's working assumption to be 1-pass. So > shouldn't we agree on one definition? > > -- Scott > > > > > -- Dipl.-Inf. Meiko Jensen Chair for Network and Data Security Horst Görtz Institute for IT-Security Ruhr University Bochum, Germany _____________________________ Universitätsstr. 150, Geb. IC 4/150 D-44780 Bochum, Germany Phone: +49 (0) 234 / 32-26796 Telefax: +49 (0) 234 / 32-14347 http:// www.nds.rub.de
Received on Wednesday, 11 August 2010 12:52:14 UTC