- From: Pratik Datta <pratik.datta@oracle.com>
- Date: Wed, 03 Sep 2008 14:50:52 -0700
- To: Frederick Hirsch <frederick.hirsch@nokia.com>
- CC: ext Sean Mullan <Sean.Mullan@sun.com>, public-xmlsec@w3.org
XPath in its basic form is very easy to understand because it follows the filesystem directory notation e.g. /soap:Envelope/soap:Body An XML document is tree, and XPath is an intuitive way to navigate to nodes in the tree. You may argue that we can add an Id to the element, but some people don't like that because it breaks their schema. I have identified three concrete specification that definitely need a non Id way of identifying what is to be signed 1) ebXML Messaging : Section 4.1.3 of http://www.oasis-open.org/committees/download.php/272/ebMS_v2_0.pdf this signs the whole SOAP envelope, but excludes SOAP header with have an actor of nextMSH. - 2) UK government specification http://www.hmrc.gov.uk/ebu/responsemessages.pdf Signs a <GovTalkMessage> but excludes GovTalkMessage/IRenvelope/IRheader/IRmark 3) WSSecurity Policy http://docs.oasis-open.org/ws-sx/ws-securitypolicy/200702/ws-securitypolicy-1.2-spec-os.pdf Has a SignedElements assertion which specifies XPaths that need to be signed. If we do not use IDs and do not do XPath, then we need some alternative notation to represent elements. I feel any alternative notation will somehow end up being similar to XPath. Unfortunately XPath is become very complicated. XPath 2.0 is even more complicated. What we need is something simple - basically I am trying to define an XPath 0.5 , where we can use the simple slash notation of XPath, and not have to support all the complex features. Also XPath Filter 1.0 requires you to specify the XPath "inside out" which makes it very hard to understand. In my proposal I have gone with the regular XPath expressions, as used in XPath Filter 2.0. I understand the concern of streaming XPath parsers, but the XPath subset that I am defining is simple enough, that an implementor can define streaming XPath engine at least with less complexity than a canonicalization engine. Also we are not preventing people from using a regular DOM based XPath engine, which is very readily available. By the way, what is the NodeSelection transform ? Pratik Frederick Hirsch wrote: > > +1 > > might be less confusing, possibly simpler. > > regards, Frederick > > Frederick Hirsch > Nokia > > (not as chair) > On Sep 3, 2008, at 4:59 PM, ext Sean Mullan wrote: > >> >> Hi Pratik, >> >> Nice writeup, thanks. >> >> One question I would pose is why do we necessarily have to use XPath >> and try to work around it with all sorts of restrictions? Why not >> just come up with something completely different that does what we >> want, and only what we want, for example, the NodeSelection Transform. >> >> From an implementor's point of view, I think that there won't be >> any/many available XPath library that I will be able to use that >> supports streaming or if there is it will not be a good fit for what >> I need, as you mention below with the ones that you have studied. >> That concerns me and I think would affect deployment for other >> implementations as well. We have to try to make it easier to >> implement XML Signature. So I would ask if it is essential to use >> XPath? What benefits are we getting by sticking to XPath and not just >> coming up with a new and simpler Transform that cuts out all the >> complexity? Maybe for the XPath expression syntax (although >> personally I always found XPath expressions difficult to decipher >> without diving into the specification)? Or is it mainly to try to be >> compatible with existing implementations? And is that a realistic >> requirement? >> >> Thanks, >> Sean >> >> >> Pratik Datta wrote: >>> This proposal is modifies the Transform to address the following >>> requirements: >>> Requirements >>> ------------ >>> 1) Check what is signed: >>> Looking at a signature, it should be possible to find out what was >>> signed. This is one of the best practices for verification. A >>> receiver must not blindly verify a signature without at first >>> checking if what was supposed to have been included in the signature >>> is really signed. >>> 2) Support Streaming >>> Currently many transforms are based on nodesets, and nodesets imply >>> DOM, and DOM requires the whole document to be loaded in memory >>> which is bad for performance >>> Change 1: Distinguish between selection and canonicalization transforms >>> ----------------------------------------------------------------------- >>> To support the "check what is signed" requirement, we need to >>> distinguish between transforms that select data to be signed, and >>> transforms that convert that data to bytes. >>> Selection Transforms: XPath Filter, XPath Filter 2.0, Enveloped >>> Signature, Decrypt Transform Canonicalization Transforms: C14n, >>> exc-C14N, base 64 >>> XSLT transform can be used for anything, e.g. there could be a XSLT >>> transform to remove white spaces, then this particular XSLT >>> transform would fall in the canonicalization bucket. >>> The WS-Security STR Transform does both Selection and >>> Canonicalization. WS Security SWA attachment transforms do selection. >>> Change 2 : Limit transformation sequence to selection first, >>> canonicalization second >>> ------------------------------------------------------------------------------------ >>> Currently there is no limitation on the ordering of transforms, so >>> somebody could create a signature with >>> c14n, xpath >>> According to the processing rules, this means that reference URI is >>> resolved and canonicalized into a octet stream, which is then >>> reparsed into a xml, and then xpath is applied to select the nodes, >>> after that another implicit c14n is performed to covert it into a >>> octet stream. >>> This is completely meaningless, and besides XML parsing is an >>> expensive operation. So we would like to define a strict rules on >>> the sequence of transforms >>> * There can be no transforms after c14n (or after WS Security >>> STRTransform which includes c14n transform) >>> * No transforms after base64 because it produces a octet stream, >>> which is to be directly digested >>> * Other transforms that emit octet stream (like the WS Security SWA >>> Attachment transforms) should also be the last one >>> * XSLT also produces an Octet stream, but that needs to be dealt >>> differently because it is not canonicalized and cannot be digested >>> directly - actually I would vote for removing XSLT transform >>> completely, because first of all it is insecure - very easy to have >>> DoS attacks, secondly it is completely unstreamable (unless we have >>> a very restricted XSLT), thirdly it loses the original nodeset so >>> makes it impossible to determine what was really signed. >>> * XPath Filter or XPath Filter 2.0 should be the first transform, >>> and there should only one XPath transform. >>> * There can be only one enveloped signature transform >>> * Only one Decrypt transform >>> * Base64 transform should only take a single text node or an element >>> with a single text node child as input. (This restriction is to >>> eliminate dependency on the Xpath text() function, which is not >>> streamable as it needs to select any number of text nodes and >>> concatenate them) >>> These rules eliminate XML Parsing during transform processing, and >>> also make it possible to determine what is signed. >>> Change 3: Use simple XPaths in XPath Transform >>> ---------------------------------------------- >>> XPath poses a lot of problems - first of all it is insecure - DoS >>> attacks are possible, secondly XPath inherently requires a DOM, >>> there is a only a limited set of XPath that can be streamed, thirdly >>> XPath make is very hard to know what is signed, fourthly XPath >>> Filter 1.0 are inside out and very difficult to write and understand >>> (although this is fixed in XPath Filter 2.0) >>> XPaths can also be specified in an XPointer URI, but since XPointers >>> were marked OPTIONAL, but XPath Transform were marked RECOMMENDED, >>> XPointers have never really been used. I propose that we just >>> drop/deprecate them. >>> To solve these XPath problems, I propose a new mechanism to to >>> specify the XPath transform, which is essentially a restricted form >>> of the XPath Filter 2.0. It has >>> * an included Xpath - identifies subtrees that need to be signed >>> (optional - an URI an can be used instead of this) >>> * an excluded Xpath - (optional) identifies subtrees or attributes >>> need to be excluded >>> The included XPath is similar to the "intersect" and the excluded >>> XPath is similar to the "subtract" of the XPath Filter 2.0. >>> Restrictions >>> * As mentioned above, if Xpath is used, it should be the first >>> transform, (there can be only one Xpath transform in the transform >>> list), >>> * If included is used, the reference URI should be "", i.e. refer to >>> the complete document >>> * The XPath expression itself is very restricted as mentioned below >>> * Unlike XPath Filter 2.0, there is only included XPath and one >>> excluded XPath, and the excluded overrides included. >>> I am open to the syntax, as long as we can have this included and >>> excluded XPaths. One idea is to preserve backwards compatibility, >>> and just add two attributes "included" and "excluded" to the >>> existing XPath transform, like this: >>> <Transform Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116"> >>> <XPath included="..." excluded="..."> >>> ... >>> </XPath> >>> </Transform> >>> So an older implementation will execute the XPath Filter 1.0 >>> Transform, whereas a newer implementation will just process the >>> included and excluded XPaths. >>> This proposal also makes it easy to determine what is signed. There >>> is only Xpath transform, and this Xpath has only one included XPath, >>> so it is easy to to do static analysis of the signature to determine >>> what elements were signed. >>> Streaming XPath >>> --------------- >>> There are many streaming Xpath implementations, and they impose >>> different kinds of constraints on the XPath. >>> I looked at XSQ implementation which Thomas had pointed out >>> http://www.cs.umd.edu/projects/xsq/. >>> and some others >>> http://www.idealliance.org/papers/xml2001/papers/pdf/05-01-01.pdf >>> http://cs.nyu.edu/~deepak/publications/icde.pdf >>> http://www.stanford.edu/class/cs276b/handouts/presentations/joshislezberg.ppt >>> http://www.idealliance.org/proceedings/xml04/papers/299/RandomAccessXML.pdf >>> >>> They have varying constrains some common ones are >>> * Only forward axes - like child, descendant, forward-sibling >>> (reverse axes are very difficult) >>> * lot of limitations on predicates >>> ** no location paths in predicates >>> ** no nested predicates >>> ** functions on nodesets are not allowed e.g count(), last() etc >>> ** conversion of subtrees to strings e.g. the text() functions >>> Even with these restrictions, the implementations are very complex >>> and require state engines and static optimization >>> I would like to propose an smaller subset of the XPath, that has >>> even lesser requirements. For this imagine a streaming XML Parser >>> that is walking through the XML tree, and any point it has in memory >>> * the current element, >>> * all the attributes of the current element, >>> * all and ancestor elements >>> We assume that this parser maintains a namespace definitions and >>> also do xml:base combinations as it walks down the tree. >>> Node Text nodes can be extremely long (especially for long base64 >>> encoded string, e.. MTOM attachments), so it is possible that a text >>> node is split up, and not loaded up all in memory. >>> With this model, we impose the following restrictions >>> * Only elements can be selected. (I.e. the location path must >>> resolve to one or more elements. not attributes or text nodes) >>> * Only descendant and child axes can be used >>> * predicates can only have relational expressions involving >>> attributes. The predicate can only be at the last location step, and >>> it cannot use any functions. >>> So only simple expressions like this are allowed >>> /soap:Envelope/soap:Header[@actor = "ac"] >>> This restrictions are such that the XPath expression can be >>> evaluated with only the element, it attributes and its ancestor >>> elements. So as a streaming parser is walking down the document, it >>> can evaluate the included and excluded XPath expression for every >>> node, and determine whether a node is to be included or not. >>> Reference Processing >>> ==================== >>> These proposed changes allow the signature to be statically analyzed >>> without running through the transforms. A signature processing >>> API/Library should provide a method to statically analyze the >>> reference and return what was signed. After that the caller of this >>> library, can determine if it wants to go ahead with signature >>> verification. >>> Streaming verification >>> ---------------------- >>> These changes also allow signatures to be processed in a streaming >>> manner. Let us assume that we have already done an initial pass over >>> the document to get the signature, keys, tokens etc. (In WSSecurity >>> use case, all of these are present in the SOAP header, so this first >>> pass is just going only over a small fraction of the document, not >>> the entire document). >>> Now we set a "canonicalization and digesting engine" for each >>> reference. This engine expects streaming xml events, and >>> canonicalizes and digests them to maintain a running digest. Then we >>> do one pass over the whole document, and for each node, evalulate >>> all the XPaths/URIs for each references. If the node is part of a >>> reference we pass that event to the corresponding canonicalization >>> and digesting engine. >>> After this pass, we retrieve the digests from each engine, and check >>> if the digests match. >>> Summary >>> ------- >>> The proposal puts in a lot of restrictions to the Transforms, to >>> make it possible to check what was signed, and to perform >>> signing/verification operations in a stream. >>> Pratik >> >> > >
Received on Wednesday, 3 September 2008 21:51:31 UTC