Re: Strawman proposal for modified Transform processing for Streamability from Sean Mullan on 2008-09-03 (public-xmlsec@w3.org from September 2008)

From: Sean Mullan <Sean.Mullan@Sun.COM>
Date: Wed, 03 Sep 2008 16:59:18 -0400
To: Pratik Datta <pratik.datta@oracle.com>
Cc: public-xmlsec@w3.org
Message-id: <48BEFAA6.9000308@sun.com>
Hi Pratik,

Nice writeup, thanks.

One question I would pose is why do we necessarily have to use XPath and try to 
work around it with all sorts of restrictions? Why not just come up with 
something completely different that does what we want, and only what we want, 
for example, the NodeSelection Transform.

 From an implementor's point of view, I think that there won't be any/many 
available XPath library that I will be able to use that supports streaming or if 
there is it will not be a good fit for what I need, as you mention below with 
the ones that you have studied. That concerns me and I think would affect 
deployment for other implementations as well. We have to try to make it easier 
to implement XML Signature. So I would ask if it is essential to use XPath? What 
benefits are we getting by sticking to XPath and not just coming up with a new 
and simpler Transform that cuts out all the complexity? Maybe for the XPath 
expression syntax (although personally I always found XPath expressions 
difficult to decipher without diving into the specification)? Or is it mainly to 
try to be compatible with existing implementations? And is that a realistic 
requirement?

Thanks,
Sean


Pratik Datta wrote:
> 
> 
> This proposal is modifies the Transform to address the following 
> requirements:
> 
> Requirements
> ------------
> 
> 1) Check what is signed:
> Looking at a signature, it should be possible to find out what was 
> signed. This is one of the best practices for verification. A receiver 
> must not blindly verify a signature without at first checking if what 
> was supposed to have been included in the signature is really signed.
> 
> 2) Support Streaming
> Currently many transforms are based on nodesets, and nodesets imply DOM, 
> and DOM requires the whole document to be loaded in memory which is bad 
> for performance
> 
> 
> 
> Change 1: Distinguish between selection and canonicalization transforms
> -----------------------------------------------------------------------
> To support the "check what is signed" requirement, we need to 
> distinguish between transforms that select data to be signed, and 
> transforms that convert that data to bytes.
> 
> Selection Transforms: XPath Filter, XPath Filter 2.0, Enveloped 
> Signature, Decrypt Transform  Canonicalization Transforms: C14n, 
> exc-C14N, base 64
> 
> XSLT transform can be used for anything, e.g. there could be a XSLT 
> transform to remove white spaces, then this particular XSLT transform 
> would fall in the canonicalization bucket.
> 
> The WS-Security STR Transform does both Selection and Canonicalization.  
> WS Security SWA attachment transforms do selection.
> 
> 
> Change 2 : Limit transformation sequence to selection first, 
> canonicalization second
> ------------------------------------------------------------------------------------ 
> 
> Currently there is no limitation on the ordering of transforms, so 
> somebody could create a signature with
> 
> c14n, xpath
> 
> According to the processing rules, this means that reference URI is 
> resolved and canonicalized into a octet stream, which is then reparsed 
> into a xml, and then xpath is applied to select the nodes, after that 
> another implicit c14n is performed to covert it into a octet stream.
> 
> This is completely meaningless, and besides XML parsing is an expensive 
> operation. So we would like to define a strict rules on the sequence of 
> transforms
> 
> * There can be no transforms after c14n (or after WS Security 
> STRTransform which includes c14n transform)
> * No transforms after base64 because it produces a octet stream, which 
> is to be directly digested
> * Other transforms that emit octet stream (like the WS Security SWA 
> Attachment transforms) should also be the last one
> * XSLT also produces an Octet stream, but that needs to be dealt 
> differently because it is not canonicalized and cannot be digested 
> directly - actually I would vote for removing XSLT transform completely, 
> because first of all it is insecure - very easy to have DoS attacks, 
> secondly it is completely unstreamable (unless we have a very restricted 
> XSLT), thirdly it loses the original nodeset so makes it impossible to 
> determine what was really signed.
> * XPath Filter or XPath Filter 2.0 should be the first transform, and 
> there should only one XPath transform.
> * There can be only one enveloped signature transform
> * Only one Decrypt transform
> * Base64 transform should only take a single text node or an element 
> with a single text node child as input.  (This restriction is to 
> eliminate dependency on the Xpath text() function, which is not 
> streamable as it needs to select any number of text nodes and 
> concatenate them)
> 
> 
> These rules eliminate XML Parsing during transform processing, and also 
> make it possible to determine what is signed.
> 
> 
> Change 3: Use simple XPaths in XPath Transform
> ----------------------------------------------
> XPath poses a lot of problems - first of all it is insecure - DoS 
> attacks are possible, secondly XPath inherently requires a DOM, there is 
> a only a limited set of XPath that can be streamed, thirdly XPath make 
> is very hard to know what is signed, fourthly XPath Filter 1.0 are 
> inside out and very difficult to write and understand (although this is 
> fixed in XPath Filter 2.0)
> 
> XPaths can also be specified in an XPointer URI, but since XPointers 
> were marked OPTIONAL, but XPath Transform were marked RECOMMENDED, 
> XPointers have never really been used. I propose that we just 
> drop/deprecate them.
> 
> To solve these XPath problems, I propose a new mechanism to to specify 
> the XPath transform, which is essentially a restricted form of the XPath 
> Filter 2.0. It has
> * an included Xpath  - identifies subtrees that need to be signed 
> (optional - an URI an can be used instead of this)
> * an excluded Xpath  - (optional) identifies subtrees or attributes need 
> to be excluded
> 
> The included XPath is similar to the "intersect" and the excluded XPath 
> is similar to the "subtract" of the XPath Filter 2.0.
> 
> Restrictions
> * As mentioned above, if Xpath is used, it should be the first 
> transform, (there can be only one Xpath transform in the transform list),
> * If included is used, the reference URI should be "", i.e. refer to the 
> complete document
> * The XPath expression itself is very restricted as mentioned below
> * Unlike XPath Filter 2.0, there is only included XPath and one excluded 
> XPath, and the excluded overrides included.
> 
> 
> I am open to the syntax, as long as we can have this included and 
> excluded XPaths. One idea is to preserve backwards compatibility, and 
> just add two attributes "included" and "excluded" to the existing XPath 
> transform, like this:
> 
> <Transform Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
> <XPath included="..." excluded="...">
> ...
> </XPath>
> </Transform>
> 
> So an older implementation will execute the XPath Filter 1.0 Transform, 
> whereas a newer implementation will just process the included and 
> excluded XPaths.
> 
> This proposal also makes it easy to determine what is signed. There is 
> only Xpath transform, and this Xpath has only one included XPath, so it 
> is easy to to do static analysis of the signature to determine what 
> elements were signed.
> 
> 
> Streaming XPath
> ---------------
> There are many streaming Xpath implementations, and they impose 
> different kinds of constraints on the XPath.
> I looked at XSQ implementation which Thomas had pointed out 
> http://www.cs.umd.edu/projects/xsq/.
> 
> and some others
> http://www.idealliance.org/papers/xml2001/papers/pdf/05-01-01.pdf
> http://cs.nyu.edu/~deepak/publications/icde.pdf
> http://www.stanford.edu/class/cs276b/handouts/presentations/joshislezberg.ppt 
> 
> http://www.idealliance.org/proceedings/xml04/papers/299/RandomAccessXML.pdf
> 
> They have varying constrains some common ones are
> * Only forward axes - like child, descendant, forward-sibling     
> (reverse axes are very difficult)
> * lot of limitations on predicates
>  ** no location paths in predicates
>  ** no nested predicates
>  ** functions on nodesets are not allowed e.g count(), last() etc
>  ** conversion of subtrees to strings e.g. the text() functions
> 
> 
> Even with these restrictions, the implementations are very complex and 
> require state engines and static optimization
> 
> I would like to propose an smaller subset of the XPath, that has even 
> lesser requirements. For this imagine a streaming XML Parser that is 
> walking through the XML tree, and any point it has in memory
> * the current element,
> * all the attributes of the current element,
> * all and ancestor elements
> We assume that this parser maintains a namespace definitions and also do 
> xml:base combinations as it walks down the tree.
> 
> 
> Node Text nodes can be extremely long (especially for long base64 
> encoded string, e.. MTOM attachments), so it is possible that a text 
> node is split up, and not loaded up all in memory.
> 
> 
> With this model, we impose the following restrictions
> * Only elements can be selected.  (I.e. the location path must resolve 
> to one or more elements. not attributes or text nodes)
> * Only descendant and child axes can be used
> * predicates can only have relational expressions involving attributes. 
> The predicate can only be at the last location step, and it cannot use 
> any functions.
> 
> So only simple expressions like this are allowed
> /soap:Envelope/soap:Header[@actor = "ac"]
> 
> 
> This restrictions are such that the XPath expression can be evaluated 
> with only the element, it attributes and its ancestor elements. So as a 
> streaming parser is walking down the document, it can evaluate the 
> included and excluded XPath expression for every node, and determine 
> whether a node is to be included or not.
> 
> 
> 
> Reference Processing
> ====================
> These proposed changes allow the signature to be statically analyzed 
> without running through the transforms.  A signature processing 
> API/Library should provide a method to statically analyze the reference 
> and return what was signed. After that the caller of this library, can 
> determine if it wants to go ahead with signature verification.
> 
> 
> Streaming verification
> ----------------------
> These changes also allow signatures to be processed in a streaming 
> manner. Let us assume that we have already done an initial pass over the 
> document to get the signature, keys, tokens etc. (In WSSecurity use 
> case, all of these are present in the SOAP header, so this first pass is 
> just going only over a small fraction of the document, not the entire 
> document).
> 
> Now we set a "canonicalization and digesting engine" for each reference. 
> This engine expects streaming xml events, and canonicalizes and digests 
> them to maintain a running digest. Then we do one pass over the whole 
> document, and for each node, evalulate all the XPaths/URIs for each 
> references. If the node is part of a reference we pass that event to the 
> corresponding canonicalization and digesting engine.
> 
> After this pass, we retrieve the digests from each engine, and check if 
> the digests match.
> 
> 
> 
> Summary
> -------
> The proposal puts in a lot of restrictions to the Transforms, to make it 
> possible to check what was signed, and to perform signing/verification 
> operations in a stream.
> 
> 
> 
> 
> Pratik
> 
> 
> 
> 
>
Received on Wednesday, 3 September 2008 21:00:13 UTC