Re: 1 pass vs 2 pass

Pratik, Scott,

I'm still in favor of a one-pass solution, for the following reason:

Depending on the domain of application, one-pass is way more
memory-efficient. As Pratik pointed out, the difference between an XML
document's text representation and DOM representation already gives an
enormous memory optimization potential, but in the one-pass scenario you
can even process the XML document's content concurrently (using an event
pipeline or mediator technique) and thus do not even have to store the
full text representation either. For instance, SOAP messages can be
read, signature-verified and application-processed in a single streaming
parsing attempt (see our publications), thus you wouldn't have to store
the message in full at any time. If you default in having a second pass,
you'd always need to store the full message in memory, hence loosing
that optimization potential. I'm not talking about a factor 5 or factor
10 here, but about O(1) vs. O(n) memory usage in best case. This is why
I'm strongly recommending to think of one-pass solutions.

However, Pratik is right on that one-pass is not always achievable. For
instance, on signature application you have to insert the hash values,
Signature element etc. *after* processing the XML document, hence you'll
need some kind of second pass, or at least partial caching techniques
(see our SWS07 paper). Nevertheless, signature verification can be done
in one-pass, as long as no backward references are used. Hence I'd
suggest trying to support this case as good as possible.

Minor note: when you can do two-pass, then you can also resolve backward
references in XPath. Not all cases, but a single "parent" axis is
definitely achievable. In some cases, the parent axis can even be
followed in one-pass, as long as the remaining XPath does not point to a
previous element afterwards. For instance, following
"parent::soap:Header/parent::soap:Envelope/soap:Body" is resolvable,
since the soap:Body element always follows the soap:Header, hence will
be processed later-on in the streaming parser approach. This might get
interesting if we want to support relative XPaths.

In short, two-pass can make things more easy for implementers, but gives
up a lot of optimization potential.

best regards

Meiko

Pratik Datta schrieb:
> Yes, my definition of streaming has always been 2-pass.
>
> When you load up an XML into a DOM it explodes in size at least 5x times maybe even 20x times.  This memory increase limits scalability, and it also decreases performance, because DOM results in a lot of little objects and that makes the garbage collector kick in more often.
>
> Streaming, whether 1-pass or 2-pass, solves this problem. With the 2 pass approach you don't have to worry about forward references.  The XPath should still need to be evaluatable in a 1 pass, but the <Signature> can be analyzed or updated in a 2nd pass.
>
> Pratik
>
> -----Original Message-----
> From: Scott Cantor [mailto:cantor.2@osu.edu] 
> Sent: Tuesday, August 10, 2010 10:03 AM
> To: public-xmlsec@w3.org
> Subject: 1 pass vs 2 pass
>
> I didn't get a clear sense of what the WG consensus is on this, but I raised
> this question on the call at the end because it seems like it's pretty
> critical in order to evaluate the proposals on XPath.
>
> I think I heard Pratik indicate his working definition for streaming is
> 2-pass, and I think I understood Meiko's working assumption to be 1-pass. So
> shouldn't we agree on one definition?
>
> -- Scott
>
>
>
>
>   

-- 
Dipl.-Inf. Meiko Jensen
Chair for Network and Data Security 
Horst Görtz Institute for IT-Security 
Ruhr University Bochum, Germany
_____________________________
Universitätsstr. 150, Geb. IC 4/150
D-44780 Bochum, Germany
Phone: +49 (0) 234 / 32-26796
Telefax: +49 (0) 234 / 32-14347
http:// www.nds.rub.de

Received on Wednesday, 11 August 2010 12:52:14 UTC