Re: Trying to assess the depth of xml:id and c14n incompatibilities from Elliotte Harold on 2005-02-12 (public-xml-id@w3.org from February 2005)

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Sat, 12 Feb 2005 16:58:49 -0500
To: veillard@redhat.com
CC: www-tag@w3.org, public-xml-core-wg@w3.org, public-xml-id@w3.org
Message-ID: <420E7C19.6060402@metalab.unc.edu>

Daniel Veillard wrote:

>   Now where is the problem exactly ?
>   From an XML-1.0 + Namespace point of view the serialized fragment obtained
> that way is still perfectly okay (i.e. a well balanced chunk), the only
> problem which may arise are:
>    1/ layers implementing xml:id will raise an error, however this is
>       not a fatal error (see http://www.w3.org/TR/xml-id/#errors)
>       xml:id processors are just instructed to report the duplicate ID
>       error to the application using it
>    2/ XPath pointers to that fragment can be disrupted
> 

The issue for me is a little different. It's that someone can 
deliberately place an ID on an element, and the process of 
canonicalization can move that ID to a different element. That it may 
move the ID to several elements is even funkier, but even if it could 
only move it to a single different element, it would be a problem.

"id" stands for identifier. The value of this element is supposed to 
uniquely identify not just any element, but a particular element. This 
identification can be used in many contexts: XPath, XPointer, XSLT, DOM, 
  all sorts of custom written programs, and more. I claim that any 
process that, as an unintended side effect, moves IDs from one element 
to a different element, is deeply flawed.

For instance, somebody may use the sequential numbers cc1, cc2, cc3 and 
so forth to find all the credit_card elements in a document. 
Canonicalization of xml:id could move those IDs onto person elements or 
expiration date elements, or something else. I can't begin to imagine 
all the different ways this could cause the trouble.

The problem is simply that IDs can unexpectedly move from the element 
they are intended for, to an element that they were not intended for. 
This will cause applications to choose the wrong elements. How that 
affects any given application will vary from one application to the 
next, of course; but I can't help but think that some of the 
applications will have really major, potentially disastrous problems as 
a result of IDs unexpectedly moving following the process of 
canonicalization.

While we can call out the potential problems in the spec, and warn 
people who use xml:id to only use exclusive canonicalization, I fear 
that someone is going to be receiving documents they did not write that 
use xml:id, and processing them with tool chains that have not been 
updated. In other words, they may never have even looked at the xml:id 
spec but nonetheless be affected by this problem. I really think we need 
to eliminate the problem at the source by replacing xml:id with some 
attribute that does not have this unintended interaction with 
canonicalization.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Received on Saturday, 12 February 2005 21:58:52 UTC