Strawman proposal for modified Transform processing for Streamability from Pratik Datta on 2008-09-01 (public-xmlsec@w3.org from September 2008)

From: Pratik Datta <pratik.datta@oracle.com>
Date: Mon, 01 Sep 2008 10:21:03 -0700
To: public-xmlsec@w3.org
Message-ID: <48BC247F.6050001@oracle.com>
This proposal is modifies the Transform to address the following 
requirements:

Requirements
------------

1) Check what is signed:
Looking at a signature, it should be possible to find out what was 
signed. This is one of the best practices for verification. A receiver 
must not blindly verify a signature without at first checking if what 
was supposed to have been included in the signature is really signed.

2) Support Streaming
Currently many transforms are based on nodesets, and nodesets imply DOM, 
and DOM requires the whole document to be loaded in memory which is bad 
for performance



Change 1: Distinguish between selection and canonicalization transforms
-----------------------------------------------------------------------
To support the "check what is signed" requirement, we need to 
distinguish between transforms that select data to be signed, and 
transforms that convert that data to bytes.

Selection Transforms: XPath Filter, XPath Filter 2.0, Enveloped 
Signature, Decrypt Transform  
Canonicalization Transforms: C14n, exc-C14N, base 64

XSLT transform can be used for anything, e.g. there could be a XSLT 
transform to remove white spaces, then this particular XSLT transform 
would fall in the canonicalization bucket.

The WS-Security STR Transform does both Selection and Canonicalization.  
WS Security SWA attachment transforms do selection.


Change 2 : Limit transformation sequence to selection first, 
canonicalization second
------------------------------------------------------------------------------------ 

Currently there is no limitation on the ordering of transforms, so 
somebody could create a signature with

c14n, xpath

According to the processing rules, this means that reference URI is 
resolved and canonicalized into a octet stream, which is then reparsed 
into a xml, and then xpath is applied to select the nodes, after that 
another implicit c14n is performed to covert it into a octet stream.

This is completely meaningless, and besides XML parsing is an expensive 
operation. So we would like to define a strict rules on the sequence of 
transforms

* There can be no transforms after c14n (or after WS Security 
STRTransform which includes c14n transform)
* No transforms after base64 because it produces a octet stream, which 
is to be directly digested
* Other transforms that emit octet stream (like the WS Security SWA 
Attachment transforms) should also be the last one
* XSLT also produces an Octet stream, but that needs to be dealt 
differently because it is not canonicalized and cannot be digested 
directly - actually I would vote for removing XSLT transform completely, 
because first of all it is insecure - very easy to have DoS attacks, 
secondly it is completely unstreamable (unless we have a very restricted 
XSLT), thirdly it loses the original nodeset so makes it impossible to 
determine what was really signed.
* XPath Filter or XPath Filter 2.0 should be the first transform, and 
there should only one XPath transform.
* There can be only one enveloped signature transform
* Only one Decrypt transform
* Base64 transform should only take a single text node or an element 
with a single text node child as input.  (This restriction is to 
eliminate dependency on the Xpath text() function, which is not 
streamable as it needs to select any number of text nodes and 
concatenate them)


These rules eliminate XML Parsing during transform processing, and also 
make it possible to determine what is signed.


Change 3: Use simple XPaths in XPath Transform
----------------------------------------------
XPath poses a lot of problems - first of all it is insecure - DoS 
attacks are possible, secondly XPath inherently requires a DOM, there is 
a only a limited set of XPath that can be streamed, thirdly XPath make 
is very hard to know what is signed, fourthly XPath Filter 1.0 are 
inside out and very difficult to write and understand (although this is 
fixed in XPath Filter 2.0)

XPaths can also be specified in an XPointer URI, but since XPointers 
were marked OPTIONAL, but XPath Transform were marked RECOMMENDED, 
XPointers have never really been used. I propose that we just 
drop/deprecate them.

To solve these XPath problems, I propose a new mechanism to to specify 
the XPath transform, which is essentially a restricted form of the XPath 
Filter 2.0. It has
* an included Xpath  - identifies subtrees that need to be signed 
(optional - an URI an can be used instead of this)
* an excluded Xpath  - (optional) identifies subtrees or attributes need 
to be excluded

The included XPath is similar to the "intersect" and the excluded XPath 
is similar to the "subtract" of the XPath Filter 2.0.

Restrictions
* As mentioned above, if Xpath is used, it should be the first 
transform, (there can be only one Xpath transform in the transform list),
* If included is used, the reference URI should be "", i.e. refer to the 
complete document
* The XPath expression itself is very restricted as mentioned below
* Unlike XPath Filter 2.0, there is only included XPath and one excluded 
XPath, and the excluded overrides included.


I am open to the syntax, as long as we can have this included and 
excluded XPaths. One idea is to preserve backwards compatibility, and 
just add two attributes "included" and "excluded" to the existing XPath 
transform, like this:

<Transform Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
<XPath included="..." excluded="...">
...
</XPath>
</Transform>

So an older implementation will execute the XPath Filter 1.0 Transform, 
whereas a newer implementation will just process the included and 
excluded XPaths.

This proposal also makes it easy to determine what is signed. There is 
only Xpath transform, and this Xpath has only one included XPath, so it 
is easy to to do static analysis of the signature to determine what 
elements were signed.


Streaming XPath
---------------
There are many streaming Xpath implementations, and they impose 
different kinds of constraints on the XPath.
I looked at XSQ implementation which Thomas had pointed out 
http://www.cs.umd.edu/projects/xsq/.

and some others
http://www.idealliance.org/papers/xml2001/papers/pdf/05-01-01.pdf
http://cs.nyu.edu/~deepak/publications/icde.pdf
http://www.stanford.edu/class/cs276b/handouts/presentations/joshislezberg.ppt
http://www.idealliance.org/proceedings/xml04/papers/299/RandomAccessXML.pdf

They have varying constrains some common ones are
* Only forward axes - like child, descendant, forward-sibling     
(reverse axes are very difficult)
* lot of limitations on predicates
  ** no location paths in predicates
  ** no nested predicates
  ** functions on nodesets are not allowed e.g count(), last() etc
  ** conversion of subtrees to strings e.g. the text() functions


Even with these restrictions, the implementations are very complex and 
require state engines and static optimization

I would like to propose an smaller subset of the XPath, that has even 
lesser requirements. For this imagine a streaming XML Parser that is 
walking through the XML tree, and any point it has in memory
* the current element,
* all the attributes of the current element,
* all and ancestor elements
We assume that this parser maintains a namespace definitions and also do 
xml:base combinations as it walks down the tree.


Node Text nodes can be extremely long (especially for long base64 
encoded string, e.. MTOM attachments), so it is possible that a text 
node is split up, and not loaded up all in memory.


With this model, we impose the following restrictions
* Only elements can be selected.  (I.e. the location path must resolve 
to one or more elements. not attributes or text nodes)
* Only descendant and child axes can be used
* predicates can only have relational expressions involving attributes. 
The predicate can only be at the last location step, and it cannot use 
any functions.

So only simple expressions like this are allowed
/soap:Envelope/soap:Header[@actor = "ac"]


This restrictions are such that the XPath expression can be evaluated 
with only the element, it attributes and its ancestor elements. So as a 
streaming parser is walking down the document, it can evaluate the 
included and excluded XPath expression for every node, and determine 
whether a node is to be included or not.



Reference Processing
====================
These proposed changes allow the signature to be statically analyzed 
without running through the transforms.  A signature processing 
API/Library should provide a method to statically analyze the reference 
and return what was signed. After that the caller of this library, can 
determine if it wants to go ahead with signature verification.


Streaming verification
----------------------
These changes also allow signatures to be processed in a streaming 
manner. Let us assume that we have already done an initial pass over the 
document to get the signature, keys, tokens etc. (In WSSecurity use 
case, all of these are present in the SOAP header, so this first pass is 
just going only over a small fraction of the document, not the entire 
document).

Now we set a "canonicalization and digesting engine" for each reference. 
This engine expects streaming xml events, and canonicalizes and digests 
them to maintain a running digest. Then we do one pass over the whole 
document, and for each node, evalulate all the XPaths/URIs for each 
references. If the node is part of a reference we pass that event to the 
corresponding canonicalization and digesting engine.

After this pass, we retrieve the digests from each engine, and check if 
the digests match.



Summary
-------
The proposal puts in a lot of restrictions to the Transforms, to make it 
possible to check what was signed, and to perform signing/verification 
operations in a stream.




Pratik
Received on Monday, 1 September 2008 17:18:28 UTC