- From: Pratik Datta <pratik.datta@oracle.com>
- Date: Mon, 01 Sep 2008 10:21:03 -0700
- To: public-xmlsec@w3.org
This proposal is modifies the Transform to address the following requirements: Requirements ------------ 1) Check what is signed: Looking at a signature, it should be possible to find out what was signed. This is one of the best practices for verification. A receiver must not blindly verify a signature without at first checking if what was supposed to have been included in the signature is really signed. 2) Support Streaming Currently many transforms are based on nodesets, and nodesets imply DOM, and DOM requires the whole document to be loaded in memory which is bad for performance Change 1: Distinguish between selection and canonicalization transforms ----------------------------------------------------------------------- To support the "check what is signed" requirement, we need to distinguish between transforms that select data to be signed, and transforms that convert that data to bytes. Selection Transforms: XPath Filter, XPath Filter 2.0, Enveloped Signature, Decrypt Transform Canonicalization Transforms: C14n, exc-C14N, base 64 XSLT transform can be used for anything, e.g. there could be a XSLT transform to remove white spaces, then this particular XSLT transform would fall in the canonicalization bucket. The WS-Security STR Transform does both Selection and Canonicalization. WS Security SWA attachment transforms do selection. Change 2 : Limit transformation sequence to selection first, canonicalization second ------------------------------------------------------------------------------------ Currently there is no limitation on the ordering of transforms, so somebody could create a signature with c14n, xpath According to the processing rules, this means that reference URI is resolved and canonicalized into a octet stream, which is then reparsed into a xml, and then xpath is applied to select the nodes, after that another implicit c14n is performed to covert it into a octet stream. This is completely meaningless, and besides XML parsing is an expensive operation. So we would like to define a strict rules on the sequence of transforms * There can be no transforms after c14n (or after WS Security STRTransform which includes c14n transform) * No transforms after base64 because it produces a octet stream, which is to be directly digested * Other transforms that emit octet stream (like the WS Security SWA Attachment transforms) should also be the last one * XSLT also produces an Octet stream, but that needs to be dealt differently because it is not canonicalized and cannot be digested directly - actually I would vote for removing XSLT transform completely, because first of all it is insecure - very easy to have DoS attacks, secondly it is completely unstreamable (unless we have a very restricted XSLT), thirdly it loses the original nodeset so makes it impossible to determine what was really signed. * XPath Filter or XPath Filter 2.0 should be the first transform, and there should only one XPath transform. * There can be only one enveloped signature transform * Only one Decrypt transform * Base64 transform should only take a single text node or an element with a single text node child as input. (This restriction is to eliminate dependency on the Xpath text() function, which is not streamable as it needs to select any number of text nodes and concatenate them) These rules eliminate XML Parsing during transform processing, and also make it possible to determine what is signed. Change 3: Use simple XPaths in XPath Transform ---------------------------------------------- XPath poses a lot of problems - first of all it is insecure - DoS attacks are possible, secondly XPath inherently requires a DOM, there is a only a limited set of XPath that can be streamed, thirdly XPath make is very hard to know what is signed, fourthly XPath Filter 1.0 are inside out and very difficult to write and understand (although this is fixed in XPath Filter 2.0) XPaths can also be specified in an XPointer URI, but since XPointers were marked OPTIONAL, but XPath Transform were marked RECOMMENDED, XPointers have never really been used. I propose that we just drop/deprecate them. To solve these XPath problems, I propose a new mechanism to to specify the XPath transform, which is essentially a restricted form of the XPath Filter 2.0. It has * an included Xpath - identifies subtrees that need to be signed (optional - an URI an can be used instead of this) * an excluded Xpath - (optional) identifies subtrees or attributes need to be excluded The included XPath is similar to the "intersect" and the excluded XPath is similar to the "subtract" of the XPath Filter 2.0. Restrictions * As mentioned above, if Xpath is used, it should be the first transform, (there can be only one Xpath transform in the transform list), * If included is used, the reference URI should be "", i.e. refer to the complete document * The XPath expression itself is very restricted as mentioned below * Unlike XPath Filter 2.0, there is only included XPath and one excluded XPath, and the excluded overrides included. I am open to the syntax, as long as we can have this included and excluded XPaths. One idea is to preserve backwards compatibility, and just add two attributes "included" and "excluded" to the existing XPath transform, like this: <Transform Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116"> <XPath included="..." excluded="..."> ... </XPath> </Transform> So an older implementation will execute the XPath Filter 1.0 Transform, whereas a newer implementation will just process the included and excluded XPaths. This proposal also makes it easy to determine what is signed. There is only Xpath transform, and this Xpath has only one included XPath, so it is easy to to do static analysis of the signature to determine what elements were signed. Streaming XPath --------------- There are many streaming Xpath implementations, and they impose different kinds of constraints on the XPath. I looked at XSQ implementation which Thomas had pointed out http://www.cs.umd.edu/projects/xsq/. and some others http://www.idealliance.org/papers/xml2001/papers/pdf/05-01-01.pdf http://cs.nyu.edu/~deepak/publications/icde.pdf http://www.stanford.edu/class/cs276b/handouts/presentations/joshislezberg.ppt http://www.idealliance.org/proceedings/xml04/papers/299/RandomAccessXML.pdf They have varying constrains some common ones are * Only forward axes - like child, descendant, forward-sibling (reverse axes are very difficult) * lot of limitations on predicates ** no location paths in predicates ** no nested predicates ** functions on nodesets are not allowed e.g count(), last() etc ** conversion of subtrees to strings e.g. the text() functions Even with these restrictions, the implementations are very complex and require state engines and static optimization I would like to propose an smaller subset of the XPath, that has even lesser requirements. For this imagine a streaming XML Parser that is walking through the XML tree, and any point it has in memory * the current element, * all the attributes of the current element, * all and ancestor elements We assume that this parser maintains a namespace definitions and also do xml:base combinations as it walks down the tree. Node Text nodes can be extremely long (especially for long base64 encoded string, e.. MTOM attachments), so it is possible that a text node is split up, and not loaded up all in memory. With this model, we impose the following restrictions * Only elements can be selected. (I.e. the location path must resolve to one or more elements. not attributes or text nodes) * Only descendant and child axes can be used * predicates can only have relational expressions involving attributes. The predicate can only be at the last location step, and it cannot use any functions. So only simple expressions like this are allowed /soap:Envelope/soap:Header[@actor = "ac"] This restrictions are such that the XPath expression can be evaluated with only the element, it attributes and its ancestor elements. So as a streaming parser is walking down the document, it can evaluate the included and excluded XPath expression for every node, and determine whether a node is to be included or not. Reference Processing ==================== These proposed changes allow the signature to be statically analyzed without running through the transforms. A signature processing API/Library should provide a method to statically analyze the reference and return what was signed. After that the caller of this library, can determine if it wants to go ahead with signature verification. Streaming verification ---------------------- These changes also allow signatures to be processed in a streaming manner. Let us assume that we have already done an initial pass over the document to get the signature, keys, tokens etc. (In WSSecurity use case, all of these are present in the SOAP header, so this first pass is just going only over a small fraction of the document, not the entire document). Now we set a "canonicalization and digesting engine" for each reference. This engine expects streaming xml events, and canonicalizes and digests them to maintain a running digest. Then we do one pass over the whole document, and for each node, evalulate all the XPaths/URIs for each references. If the node is part of a reference we pass that event to the corresponding canonicalization and digesting engine. After this pass, we retrieve the digests from each engine, and check if the digests match. Summary ------- The proposal puts in a lot of restrictions to the Transforms, to make it possible to check what was signed, and to perform signing/verification operations in a stream. Pratik
Received on Monday, 1 September 2008 17:18:28 UTC