RE: Transforms Specification in DSIG from Christian Geuer-Pollmann on 2005-03-31 (w3c-ietf-xmldsig@w3.org from April to June 2005)

From: Christian Geuer-Pollmann <chgeuer@microsoft.com>
Date: Thu, 31 Mar 2005 16:41:50 +0100
To: "DeMartini, Thomas" <Thomas.DeMartini@CONTENTGUARD.COM>, <w3c-ietf-xmldsig@w3.org>
Message-ID: <27BECCCFEF79F244903746AC07CDA4CB01EAA0F5@EUR-MSG-20.europe.corp.microsoft.com>
Thomas,
 
in my previous work (developing the Apache XML Security pieces), I had the same problems understanding XPath data model, c14n etc. One thing I included to see what these nodesets are was to have a debug mechanism by which I was able to 'serialize' an XPath node set into a HTML page to highlight what nodes in a document were selected. 
 
Unfortunately, I have not the example from XFilter2 available, but you may check the other files for reference: 
 
[1] is the input XML document (containing a signature with many References)
[2] is the HTML page highlighting what nodes are selected and 
[3] is the canonicalized node set, i.e. the input to the digest function. 
 
The example I chose is what I like to call an 'esooteric node set', beause it shows that a namespace information item is included in the nodeset, while the owner element is not part of the nodeset, thus illustrating the effects of your question number 4.
 
When you look at the example, you will see that the <foo:Something> elements are not part if the signed XPath node set, while their xmlns:foo attribute is. If you would re-parse the signed octets, you would see that an namespace attribute node became a text node :-))
 
Hope this helps a bit...

Very best regards,
Christian

[1] http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-security/data/interop/c14n/Y4/signature.xml?rev=1.1
[2] http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-security/data/interop/c14n/Y4/c14n-3.html?rev=1.2 
[3] http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-security/data/interop/c14n/Y4/c14n-3.txt?rev=1.2
 
---
Dr.-Ing. Christian Geuer-Pollmann
European Microsoft Innovation  Center (EMIC)
Ritterstr. 23, 52072 Aachen, Germany
mail:   <chgeuer@microsoft.com>
 
________________________________

From: w3c-ietf-xmldsig-request@w3.org [mailto:w3c-ietf-xmldsig-request@w3.org] On Behalf Of DeMartini, Thomas
Sent: Mittwoch, 23. März 2005 22:32
To: w3c-ietf-xmldsig@w3.org
Subject: Transforms Specification in DSIG



	This is a question about an ambiguity in the transforms specification in DSIG.  DSIG says that transforms can take as input/output octet streams or node sets.  I am concerned here with the case of node sets.

	 

	It is unclear to me what a node set is and how it is represented.  Consider, for instance, the example in section 4 of http://www.w3.org/TR/xmldsig-filter2/, which outputs a node set that "looks like"

	 

	   <ToBeSigned>
	       
	       <Data />
	       <ReallyToBeSigned>
	           
	           <Data />
	         </ReallyToBeSigned>
	     </ToBeSigned><ToBeSigned>
	       <Data />
	      
	   </ToBeSigned>

	 

	Now obviously the above is not actually a node set, but some serialization of a node set.  My question has to do with what kind of information is in the *actual* node set (rather than the serialization).  Recall that in the original document the <ReallyToBeSigned> node was a child of a <NotToBeSigned> node.  In the *actual* node set that is output from the transform, does the ReallyToBeSigned node still know that its parent is a <NotToBeSigned> node event though the <NotToBeSigned> node is not part of the output node set?  It makes a difference, if, for instance, inside of a <dsig:Transforms> element I had two <dsig:Transform> elements, the first one being as shown in the above example and the second one being a transform that selects all nodes whose direct parent is named "NotToBeSigned".

	 

	If the "ReallyToBeSigned" node in the *actual* node set that is passed to the input of the second transform knows that its parent is "NotToBeSigned", then it will also be output from the second transform.  If, however, the node forgets what its parent was, then it will excluded from the output of the second transform.

	 

	So my first question can be formulated as:

	1) Do the nodes in the output node set still exist within the context of the input document, thereby remembering their parents/children in the original input document?

	 

	My next question applies in case the answer to question #1 is that they do not exist within the context of the input document.  Let's say the second transform in my list selected all nodes whose direct parent is named "Data".  If the <ReallyToBeSigned> node remembers that its parent is <NotToBeSigned> or if it forgets its parent altogether and becomes "unparented", then it will not be included in the output node set of the second transform.  If, however, the <ReallyToBeSigned> node becomes reparented to the "Data" node, then it will be included in the output node set of the second transform.

	 

	So my second question can be formulated as:

	2) If the nodes in the output node set no longer exist within the input document, do they become "unparented" or do they become "reparented", existing within the context of a new output document consisting only of the nodes in the output node set?

	 

	If the answer to 1 and 2 is that the nodes in the output node set become reparented, then I assume it is okay to serialize the node set to a string for the purposes of passing it to another transform as input, which transform implementation can then recreate the node set using its own data structures and operate on it.  If the answer to 1 and 2 is that the nodes in the output node set remain within the original input document, then I assume I need to pass an array of nodes (each with information about its original document) from one transform to the next.  If the answer to 1 and 2 is that the nodes in the output node set become "unparented", then I am unclear what I should pass from one transform to the next.

	 

	So my third question can be formulated as:

	3) How should I pass a "node set" output from one transform as input to the next in a way that is generic enough that any node set can be fully represented without loss of information?

	 

	My final question has to do with serialization of a node set including attribute nodes whose corresponding element nodes are not in the node set.  Suppose "ReallyToBeSigned" were actually an attribute, say as follows:

	 

	<ToBeSigned>

	  <NotToBeSigned ReallyToBeSigned="signme"/>

	</ToBeSigned>

	 

	Certainly once NotToBeSigned is removed from the output node set we wouldn't want to serialize the output node set as

	 

	<ToBeSigned>ReallyToBeSigned="signme"</ToBeSigned>

	 

	or would we?

	 

	So my fourth question can be formulated as:

	4) How should I  serialize a node set containing an attribute node if the corresponding element node is not in the node set?

	 

	Thanks in advance for any clarification you can provide,

	&Thomas.
Received on Friday, 1 April 2005 04:03:55 UTC