Re: Is arbitrary nodeset support a requirement for XML signatures? from Pratik Datta on 2008-08-22 (public-xmlsec@w3.org from August 2008)

From: Pratik Datta <pratik.datta@oracle.com>
Date: Fri, 22 Aug 2008 14:27:12 -0700
To: Frederick Hirsch <frederick.hirsch@nokia.com>
CC: public-xmlsec@w3.org
Message-ID: <48AF2F30.4070207@oracle.com>
I know of some use cases which need element exclusions, i.e, specify 
some subtrees, but also list exclusions in those subtrees.

Enveloped signature is obviously one, The Signature itself needs to be 
excluded
----------
Another one is ebXML  See 
http://www.oasis-open.org/committees/download.php/272/ebMS_v2_0.pdf     
Section 4.1.3
which asks to use the following XPath transform

<XPath> not 
(ancestor-or-self::node()[@SOAP:actor="urn:oasis:names:tc:ebxml-msg:actor:nextMSH"] 
|
 ancestor-or-self::node()[@SOAP:actor="http://schemas.xmlsoap.org/soap/actor/next"] 
)
</XPath>

The result of this [XPath] statement excludes all elements within the 
SOAP Envelope which contain a SOAP:actor attribute targeting the 
nextMSH, and all their descendants. It also excludes all elements with 
actor attributes targeting the element at the next node (which may 
change en route)
-------------
A third example is a UK government specification 
http://www.hmrc.gov.uk/ebu/responsemessages.pdf  which has the following 
XPath
<dsig:XPath>(count(ancestor-or-self::node()|/gti:GovTalkMessage/gti:Body)=count(ancestor-or-self::node())) 
and 
(count(ancestor-or-self::node()|/gti:GovTalkMessage/gti:Body/*[name()='IRenvelope']/*[name()='IRheader']/*[name()='IRmark'])!=count(ancestor-or-self::node()))</dsig:XPath>

This is to include the GovTalkMessage/Body  subtree, but exclude the 
GovTalkMessage/Body/IRevenvelope/IRHeader/IRmark subtree elements from 
the signature
----------

So we need exclusions, but not very complicated exclusions. 
I haven't seen a case were we include something, then exclude part of 
it, and then reinclude part of what was excluded. i.e. exclusions are 
complete subtrees. Although I haven't see exclusion of attributes, I 
think that may be a useful thing to allow too.  



 

Pratik


Frederick Hirsch wrote:
>
> Thanks for this clear explanation Pratik.
>
>> So instead of representing an XML fragment by a nodeset, I would like 
>> it to be represented like this
>>
>>   * List of included elements:
>>   * List of excluded elements (optional)
>
>
> How about only specifying subtrees to be signed, e.g. the element of 
> each complete subtree to be signed, with no exclusions within a 
> subtree, and that is it, or the entire document element.
>
> Wouldn't that be simpler and clearer than the arbitrary nodeset 
> approach we have now?
>
> Would this prevent any sensible use cases?
>
> (I assume a subtree includes everything associated with each element, 
> e.g. attributes, namespaces, etc) but no need for namespace 
> propagation etc
>
> regards, Frederick
>
> Frederick Hirsch
> Nokia
>
>
>
> On Aug 21, 2008, at 5:50 PM, ext Pratik Datta wrote:
>
>> One of the assumptions/requirements was
>> 9. Signing can be performed on arbitrary node sets.
>>
>> Canonicalization of arbitrary nodesets introduces a lot of 
>> complications. I would like to step back and see if we really require 
>> it. The main requirement that I see is that we need to sign a 
>> fragment of an XML document, and a nodeset lets us define an 
>> arbitrary fragment. But nodesets have the following problems. I would 
>> like to see if we can have an alternative way to identify what was 
>> signed without using a nodeset, or maybe use very restrictive nodeset.
>>
>> Problem 1)  Nodesets introduces unwanted complexity with namespaces,
>> Nodesets follow the XPath Data model, which is slightly different 
>> from the DOM model. One main area of difference is Namespace Nodes. 
>> In DOM namespaces are just regular Attributes, but in XPath model 
>> these are special kind of nodes. Also the Namespace nodes need to be 
>> expanded out for every element.
>>
>> e.g. if the original document is like this
>> <e1  ns1="n1" ns2="n2">
>>    <e2>
>>      <e3/>
>>    </e2>
>> </e1>
>>
>> In Xpath model all namespaces are expanded out for every node, i.e. 
>> it becomes like this
>>
>> <e1  ns1="n1" ns2="n2">
>>    <e2 ns1="n1" ns2="n2">
>>      <e3 ns1="n1" ns2="n2"/>
>>    </e2>
>> </e1>
>>
>> An XPath filter can remove certain namespace nodes, e.g. it can 
>> remove the ns1 node from e2
>> <e1  ns1="n1" ns2="n2">
>>    <e2 ns2="n2">
>>      <e3 ns1="n1" ns2="n2"/>
>>    </e2>
>> </e1>
>>
>> This is very unnatural in XML 1.0, (it could be considered similar to 
>> namespace undeclaration of XML 1.1).  In this particular case n2 is 
>> not used, so its removal will affect inclusive c14n, but not 
>> exclusive. c14n. However a nodeset can also remove namespace nodes 
>> that are being used, which really makes it invalid XML. The 
>> canonicalization algorithms need to worry about this kind of 
>> namespace removal, even though it is completely meaningless.
>>
>>
>> Problem 2) Namespace nodes degrades performance significantly
>> Because namespace nodes are expanded for every node, the number of 
>> nodes that the implementation has to deal with increases very 
>> significantly.  Lets say there are 10 namespace nodes defined at the 
>> top level which is a pretty reasonable number for SOAP messages. Then 
>> the number of namespace nodes is 10 x number of elements. If each 
>> namespace node is a java object, that is a lot of objects and a lot 
>> of unnecessary temporary memory. I know that some implementations 
>> avoid namespace node expansion for this performance issue.
>>
>> This nodeset expansion is the basis for one the denial of service 
>> attacks in the best practices document. In that example I made 100 
>> namespace nodes and 100 elements, which means there are 10,000 nodes. 
>> Then I wrote an xpath expression which counts all the nodes, since 
>> this xpath is executed for every node, the number of iterations is 
>> 10,000 x 10,000 = 100million.  If the namespace nodes were not 
>> expanded for every node, then there would be only 200 x 200 = 4,000 
>> iterations.
>>
>>
>>
>> Problem 3) NodeSets make it hard to understand what is signed
>> In a WS Security use case, the verifier has a list of things that it 
>> expects to be signed (as defined in a WS Security Policy), and wants 
>> to make sure that they are really signed. While a nodeset is the most 
>> generic form of representing an XML fragment, it is very hard to 
>> reverse engineer.   Most often the requirement is to sign a complete 
>> subtree. A more complex use case excludes some descendant subtrees.
>>
>> So instead of representing an XML fragment by a nodeset, I would like 
>> it to be represented like this
>>
>>   * List of included elements:
>>   * List of excluded elements (optional)
>>
>> Exclusions override inclusions. (This is somewhat similar to XPath 
>> Filter 2 Transform, except that it is much simpler)
>>
>> This would make it easy to understand what was signed, I could just 
>> compare the included elements with the expected list of included 
>> elements.
>>
>> The best practices document talks about a node by node comparison 
>> that can be done to determine what is signed, but that is very 
>> expensive, since you have visit all descendant nodes of a subtree to 
>> make this comparison.
>>
>>
>>
>> Problem 4) Nodesets imply DOM
>> A nodeset is not complete information, it always needs a backing 
>> DOM.  This makes it very hard for Streaming XML implementations - 
>> SAX/StaX/XMLReader which are much more performant.
>>
>> In my streaming presentation 
>> (http://www.w3.org/2007/xmlsec/ws/slides/12-mishra-oracle/), I had 
>> talked about using an alternative representation of "XML Events".
>> E.g. the document   <e1><e2/></e1>
>> in nodesets is two nodes in a nodeset representation - e1, e2
>> but is 4 events in a streaming representation   begin(e1), begin(e2), 
>> end(e2), end(e1)
>>
>>
>>
>>
>> Summing up, nodesets are complex and slow, and full support of 
>> nodesets is not a requirement. We can still use nodesets, but put 
>> some constraints around it, to solve the above problems.
>>
>> Pratik
>>
>>
>>
>
>
Received on Friday, 22 August 2008 21:28:19 UTC