*Major* problem with xml:id in canonical XML

I just noticed a major conceptual mismatch between canonical XML and 
xml:id. The problem occurs when calculating the canonical form of a 
document subset. The issue is that each nearest attribute in the XML 
namespace are added to elements from the subset if the original ancestor 
elements that provided those attributes are not present. For instance, 
consider this document:

<root xml:id="p1">
   <child />
</root>

Now suppose we canonicalize this document with the XPath expression 
//child to select a subset. Then resulting canonical form is:

<child xml:id="p1"></child>

Worse yet, suppose we start with this input document and use the same 
XPath expression:

<root xml:id="p1">
   <child />
   <child />
   <child />
</root>

What comes out is:

<child xml:id="p1"></child>
<child xml:id="p1"></child>
<child xml:id="p1"></child>

Duplicate IDs!

I think the canonical XML spec clearly intended that all attributes in 
the XML namespace have scope over their descendants, but that's not 
really true for xml:id.

This probably has downstream implications for XML digital signatures and 
XML encryption, both of which depend on canonicalization.

Exclusive XML canonicalization does not inherit xml: attributes, and so 
does not have this problem.

I am not sure what to suggest as a fix. It is still possible to 
canonicalize a document that uses xml:id. However, the results could be 
quite unexpected and perhaps dangerous.

I wish I had a good answer here. I don't. I do think this should be 
discussed, and whatever resolution is reached needs to be called out in 
the spec to warn people about this.


-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Received on Monday, 24 January 2005 16:51:40 UTC