Re: Is arbitrary nodeset support a requirement for XML signatures? from Frederick Hirsch on 2008-08-22 (public-xmlsec@w3.org from August 2008)

From: Frederick Hirsch <frederick.hirsch@nokia.com>
Date: Fri, 22 Aug 2008 15:56:28 -0400
To: ext Pratik Datta <pratik.datta@oracle.com>
Cc: Frederick Hirsch <frederick.hirsch@nokia.com>, public-xmlsec@w3.org
Message-Id: <D8A03764-634A-45CB-8C0C-F52978A9BEC8@nokia.com>
Thanks for this clear explanation Pratik.

> So instead of representing an XML fragment by a nodeset, I would  
> like it to be represented like this
>
>   * List of included elements:
>   * List of excluded elements (optional)


How about only specifying subtrees to be signed, e.g. the element of  
each complete subtree to be signed, with no exclusions within a  
subtree, and that is it, or the entire document element.

Wouldn't that be simpler and clearer than the arbitrary nodeset  
approach we have now?

Would this prevent any sensible use cases?

(I assume a subtree includes everything associated with each element,  
e.g. attributes, namespaces, etc) but no need for namespace  
propagation etc

regards, Frederick

Frederick Hirsch
Nokia



On Aug 21, 2008, at 5:50 PM, ext Pratik Datta wrote:

> One of the assumptions/requirements was
> 9. Signing can be performed on arbitrary node sets.
>
> Canonicalization of arbitrary nodesets introduces a lot of  
> complications. I would like to step back and see if we really  
> require it. The main requirement that I see is that we need to sign  
> a fragment of an XML document, and a nodeset lets us define an  
> arbitrary fragment. But nodesets have the following problems. I  
> would like to see if we can have an alternative way to identify  
> what was signed without using a nodeset, or maybe use very  
> restrictive nodeset.
>
> Problem 1)  Nodesets introduces unwanted complexity with namespaces,
> Nodesets follow the XPath Data model, which is slightly different  
> from the DOM model. One main area of difference is Namespace Nodes.  
> In DOM namespaces are just regular Attributes, but in XPath model  
> these are special kind of nodes. Also the Namespace nodes need to  
> be expanded out for every element.
>
> e.g. if the original document is like this
> <e1  ns1="n1" ns2="n2">
>    <e2>
>      <e3/>
>    </e2>
> </e1>
>
> In Xpath model all namespaces are expanded out for every node, i.e.  
> it becomes like this
>
> <e1  ns1="n1" ns2="n2">
>    <e2 ns1="n1" ns2="n2">
>      <e3 ns1="n1" ns2="n2"/>
>    </e2>
> </e1>
>
> An XPath filter can remove certain namespace nodes, e.g. it can  
> remove the ns1 node from e2
> <e1  ns1="n1" ns2="n2">
>    <e2 ns2="n2">
>      <e3 ns1="n1" ns2="n2"/>
>    </e2>
> </e1>
>
> This is very unnatural in XML 1.0, (it could be considered similar  
> to namespace undeclaration of XML 1.1).  In this particular case n2  
> is not used, so its removal will affect inclusive c14n, but not  
> exclusive. c14n. However a nodeset can also remove namespace nodes  
> that are being used, which really makes it invalid XML. The  
> canonicalization algorithms need to worry about this kind of  
> namespace removal, even though it is completely meaningless.
>
>
> Problem 2) Namespace nodes degrades performance significantly
> Because namespace nodes are expanded for every node, the number of  
> nodes that the implementation has to deal with increases very  
> significantly.  Lets say there are 10 namespace nodes defined at  
> the top level which is a pretty reasonable number for SOAP  
> messages. Then the number of namespace nodes is 10 x number of  
> elements. If each namespace node is a java object, that is a lot of  
> objects and a lot of unnecessary temporary memory. I know that some  
> implementations avoid namespace node expansion for this performance  
> issue.
>
> This nodeset expansion is the basis for one the denial of service  
> attacks in the best practices document. In that example I made 100  
> namespace nodes and 100 elements, which means there are 10,000  
> nodes. Then I wrote an xpath expression which counts all the nodes,  
> since this xpath is executed for every node, the number of  
> iterations is 10,000 x 10,000 = 100million.  If the namespace nodes  
> were not expanded for every node, then there would be only 200 x  
> 200 = 4,000 iterations.
>
>
>
> Problem 3) NodeSets make it hard to understand what is signed
> In a WS Security use case, the verifier has a list of things that  
> it expects to be signed (as defined in a WS Security Policy), and  
> wants to make sure that they are really signed. While a nodeset is  
> the most generic form of representing an XML fragment, it is very  
> hard to reverse engineer.   Most often the requirement is to sign a  
> complete subtree. A more complex use case excludes some descendant  
> subtrees.
>
> So instead of representing an XML fragment by a nodeset, I would  
> like it to be represented like this
>
>   * List of included elements:
>   * List of excluded elements (optional)
>
> Exclusions override inclusions. (This is somewhat similar to XPath  
> Filter 2 Transform, except that it is much simpler)
>
> This would make it easy to understand what was signed, I could just  
> compare the included elements with the expected list of included  
> elements.
>
> The best practices document talks about a node by node comparison  
> that can be done to determine what is signed, but that is very  
> expensive, since you have visit all descendant nodes of a subtree  
> to make this comparison.
>
>
>
> Problem 4) Nodesets imply DOM
> A nodeset is not complete information, it always needs a backing  
> DOM.  This makes it very hard for Streaming XML implementations -  
> SAX/StaX/XMLReader which are much more performant.
>
> In my streaming presentation (http://www.w3.org/2007/xmlsec/ws/ 
> slides/12-mishra-oracle/), I had talked about using an alternative  
> representation of "XML Events".
> E.g. the document   <e1><e2/></e1>
> in nodesets is two nodes in a nodeset representation - e1, e2
> but is 4 events in a streaming representation   begin(e1), begin 
> (e2), end(e2), end(e1)
>
>
>
>
> Summing up, nodesets are complex and slow, and full support of  
> nodesets is not a requirement. We can still use nodesets, but put  
> some constraints around it, to solve the above problems.
>
> Pratik
>
>
>
Received on Friday, 22 August 2008 19:57:29 UTC