Re: Strawman proposal for modified Transform processing for Streamability from Frederick Hirsch on 2008-09-03 (public-xmlsec@w3.org from September 2008)

From: Frederick Hirsch <frederick.hirsch@nokia.com>
Date: Wed, 3 Sep 2008 17:13:45 -0400
To: "ext Sean Mullan" <Sean.Mullan@Sun.COM>
Cc: Frederick Hirsch <frederick.hirsch@nokia.com>, Pratik Datta <pratik.datta@oracle.com>, public-xmlsec@w3.org
Message-Id: <7880B7DB-EC16-44AC-A9C3-39B3114070C9@nokia.com>
+1

might be less confusing, possibly simpler.

regards, Frederick

Frederick Hirsch
Nokia

(not as chair)
On Sep 3, 2008, at 4:59 PM, ext Sean Mullan wrote:

>
> Hi Pratik,
>
> Nice writeup, thanks.
>
> One question I would pose is why do we necessarily have to use  
> XPath and try to work around it with all sorts of restrictions? Why  
> not just come up with something completely different that does what  
> we want, and only what we want, for example, the NodeSelection  
> Transform.
>
> From an implementor's point of view, I think that there won't be  
> any/many available XPath library that I will be able to use that  
> supports streaming or if there is it will not be a good fit for  
> what I need, as you mention below with the ones that you have  
> studied. That concerns me and I think would affect deployment for  
> other implementations as well. We have to try to make it easier to  
> implement XML Signature. So I would ask if it is essential to use  
> XPath? What benefits are we getting by sticking to XPath and not  
> just coming up with a new and simpler Transform that cuts out all  
> the complexity? Maybe for the XPath expression syntax (although  
> personally I always found XPath expressions difficult to decipher  
> without diving into the specification)? Or is it mainly to try to  
> be compatible with existing implementations? And is that a  
> realistic requirement?
>
> Thanks,
> Sean
>
>
> Pratik Datta wrote:
>> This proposal is modifies the Transform to address the following  
>> requirements:
>> Requirements
>> ------------
>> 1) Check what is signed:
>> Looking at a signature, it should be possible to find out what was  
>> signed. This is one of the best practices for verification. A  
>> receiver must not blindly verify a signature without at first  
>> checking if what was supposed to have been included in the  
>> signature is really signed.
>> 2) Support Streaming
>> Currently many transforms are based on nodesets, and nodesets  
>> imply DOM, and DOM requires the whole document to be loaded in  
>> memory which is bad for performance
>> Change 1: Distinguish between selection and canonicalization  
>> transforms
>> --------------------------------------------------------------------- 
>> --
>> To support the "check what is signed" requirement, we need to  
>> distinguish between transforms that select data to be signed, and  
>> transforms that convert that data to bytes.
>> Selection Transforms: XPath Filter, XPath Filter 2.0, Enveloped  
>> Signature, Decrypt Transform  Canonicalization Transforms: C14n,  
>> exc-C14N, base 64
>> XSLT transform can be used for anything, e.g. there could be a  
>> XSLT transform to remove white spaces, then this particular XSLT  
>> transform would fall in the canonicalization bucket.
>> The WS-Security STR Transform does both Selection and  
>> Canonicalization.  WS Security SWA attachment transforms do  
>> selection.
>> Change 2 : Limit transformation sequence to selection first,  
>> canonicalization second
>> --------------------------------------------------------------------- 
>> --------------- Currently there is no limitation on the ordering  
>> of transforms, so somebody could create a signature with
>> c14n, xpath
>> According to the processing rules, this means that reference URI  
>> is resolved and canonicalized into a octet stream, which is then  
>> reparsed into a xml, and then xpath is applied to select the  
>> nodes, after that another implicit c14n is performed to covert it  
>> into a octet stream.
>> This is completely meaningless, and besides XML parsing is an  
>> expensive operation. So we would like to define a strict rules on  
>> the sequence of transforms
>> * There can be no transforms after c14n (or after WS Security  
>> STRTransform which includes c14n transform)
>> * No transforms after base64 because it produces a octet stream,  
>> which is to be directly digested
>> * Other transforms that emit octet stream (like the WS Security  
>> SWA Attachment transforms) should also be the last one
>> * XSLT also produces an Octet stream, but that needs to be dealt  
>> differently because it is not canonicalized and cannot be digested  
>> directly - actually I would vote for removing XSLT transform  
>> completely, because first of all it is insecure - very easy to  
>> have DoS attacks, secondly it is completely unstreamable (unless  
>> we have a very restricted XSLT), thirdly it loses the original  
>> nodeset so makes it impossible to determine what was really signed.
>> * XPath Filter or XPath Filter 2.0 should be the first transform,  
>> and there should only one XPath transform.
>> * There can be only one enveloped signature transform
>> * Only one Decrypt transform
>> * Base64 transform should only take a single text node or an  
>> element with a single text node child as input.  (This restriction  
>> is to eliminate dependency on the Xpath text() function, which is  
>> not streamable as it needs to select any number of text nodes and  
>> concatenate them)
>> These rules eliminate XML Parsing during transform processing, and  
>> also make it possible to determine what is signed.
>> Change 3: Use simple XPaths in XPath Transform
>> ----------------------------------------------
>> XPath poses a lot of problems - first of all it is insecure - DoS  
>> attacks are possible, secondly XPath inherently requires a DOM,  
>> there is a only a limited set of XPath that can be streamed,  
>> thirdly XPath make is very hard to know what is signed, fourthly  
>> XPath Filter 1.0 are inside out and very difficult to write and  
>> understand (although this is fixed in XPath Filter 2.0)
>> XPaths can also be specified in an XPointer URI, but since  
>> XPointers were marked OPTIONAL, but XPath Transform were marked  
>> RECOMMENDED, XPointers have never really been used. I propose that  
>> we just drop/deprecate them.
>> To solve these XPath problems, I propose a new mechanism to to  
>> specify the XPath transform, which is essentially a restricted  
>> form of the XPath Filter 2.0. It has
>> * an included Xpath  - identifies subtrees that need to be signed  
>> (optional - an URI an can be used instead of this)
>> * an excluded Xpath  - (optional) identifies subtrees or  
>> attributes need to be excluded
>> The included XPath is similar to the "intersect" and the excluded  
>> XPath is similar to the "subtract" of the XPath Filter 2.0.
>> Restrictions
>> * As mentioned above, if Xpath is used, it should be the first  
>> transform, (there can be only one Xpath transform in the transform  
>> list),
>> * If included is used, the reference URI should be "", i.e. refer  
>> to the complete document
>> * The XPath expression itself is very restricted as mentioned below
>> * Unlike XPath Filter 2.0, there is only included XPath and one  
>> excluded XPath, and the excluded overrides included.
>> I am open to the syntax, as long as we can have this included and  
>> excluded XPaths. One idea is to preserve backwards compatibility,  
>> and just add two attributes "included" and "excluded" to the  
>> existing XPath transform, like this:
>> <Transform Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
>> <XPath included="..." excluded="...">
>> ...
>> </XPath>
>> </Transform>
>> So an older implementation will execute the XPath Filter 1.0  
>> Transform, whereas a newer implementation will just process the  
>> included and excluded XPaths.
>> This proposal also makes it easy to determine what is signed.  
>> There is only Xpath transform, and this Xpath has only one  
>> included XPath, so it is easy to to do static analysis of the  
>> signature to determine what elements were signed.
>> Streaming XPath
>> ---------------
>> There are many streaming Xpath implementations, and they impose  
>> different kinds of constraints on the XPath.
>> I looked at XSQ implementation which Thomas had pointed out http:// 
>> www.cs.umd.edu/projects/xsq/.
>> and some others
>> http://www.idealliance.org/papers/xml2001/papers/pdf/05-01-01.pdf
>> http://cs.nyu.edu/~deepak/publications/icde.pdf
>> http://www.stanford.edu/class/cs276b/handouts/presentations/ 
>> joshislezberg.ppt http://www.idealliance.org/proceedings/xml04/ 
>> papers/299/RandomAccessXML.pdf
>> They have varying constrains some common ones are
>> * Only forward axes - like child, descendant, forward-sibling      
>> (reverse axes are very difficult)
>> * lot of limitations on predicates
>>  ** no location paths in predicates
>>  ** no nested predicates
>>  ** functions on nodesets are not allowed e.g count(), last() etc
>>  ** conversion of subtrees to strings e.g. the text() functions
>> Even with these restrictions, the implementations are very complex  
>> and require state engines and static optimization
>> I would like to propose an smaller subset of the XPath, that has  
>> even lesser requirements. For this imagine a streaming XML Parser  
>> that is walking through the XML tree, and any point it has in memory
>> * the current element,
>> * all the attributes of the current element,
>> * all and ancestor elements
>> We assume that this parser maintains a namespace definitions and  
>> also do xml:base combinations as it walks down the tree.
>> Node Text nodes can be extremely long (especially for long base64  
>> encoded string, e.. MTOM attachments), so it is possible that a  
>> text node is split up, and not loaded up all in memory.
>> With this model, we impose the following restrictions
>> * Only elements can be selected.  (I.e. the location path must  
>> resolve to one or more elements. not attributes or text nodes)
>> * Only descendant and child axes can be used
>> * predicates can only have relational expressions involving  
>> attributes. The predicate can only be at the last location step,  
>> and it cannot use any functions.
>> So only simple expressions like this are allowed
>> /soap:Envelope/soap:Header[@actor = "ac"]
>> This restrictions are such that the XPath expression can be  
>> evaluated with only the element, it attributes and its ancestor  
>> elements. So as a streaming parser is walking down the document,  
>> it can evaluate the included and excluded XPath expression for  
>> every node, and determine whether a node is to be included or not.
>> Reference Processing
>> ====================
>> These proposed changes allow the signature to be statically  
>> analyzed without running through the transforms.  A signature  
>> processing API/Library should provide a method to statically  
>> analyze the reference and return what was signed. After that the  
>> caller of this library, can determine if it wants to go ahead with  
>> signature verification.
>> Streaming verification
>> ----------------------
>> These changes also allow signatures to be processed in a streaming  
>> manner. Let us assume that we have already done an initial pass  
>> over the document to get the signature, keys, tokens etc. (In  
>> WSSecurity use case, all of these are present in the SOAP header,  
>> so this first pass is just going only over a small fraction of the  
>> document, not the entire document).
>> Now we set a "canonicalization and digesting engine" for each  
>> reference. This engine expects streaming xml events, and  
>> canonicalizes and digests them to maintain a running digest. Then  
>> we do one pass over the whole document, and for each node,  
>> evalulate all the XPaths/URIs for each references. If the node is  
>> part of a reference we pass that event to the corresponding  
>> canonicalization and digesting engine.
>> After this pass, we retrieve the digests from each engine, and  
>> check if the digests match.
>> Summary
>> -------
>> The proposal puts in a lot of restrictions to the Transforms, to  
>> make it possible to check what was signed, and to perform signing/ 
>> verification operations in a stream.
>> Pratik
>
>
Received on Wednesday, 3 September 2008 21:14:44 UTC