Re: Trying to assess the depth of xml:id and c14n incompatibilities from Daniel Veillard on 2005-02-13 (www-tag@w3.org from February 2005)

From: Daniel Veillard <veillard@redhat.com>
Date: Sun, 13 Feb 2005 06:20:13 -0500
To: Elliotte Harold <elharo@metalab.unc.edu>
Cc: www-tag@w3.org, public-xml-core-wg@w3.org, public-xml-id@w3.org
Message-ID: <20050213112013.GG1718@redhat.com>
On Sat, Feb 12, 2005 at 08:53:26PM -0500, Elliotte Harold wrote:
> Daniel Veillard wrote:
> 
> 
> >  then xml:id will not pollute their IDs and they won't find the wrong id
> >or they will in general get an error at the xml:id level.
> 
> I don't think I expressed myself clearly enough. Consider this: a team 
> is using tools they themselves did not write. In particular they are not 
> the sort of people who hang out on xml-dev and hears about problems like 
> this. They are just trying to get their work done. They are using:
> 
> 1. A standard C14N software that treats xml:id as the spec currently 
> requires.
> 
> 2. A processor that recognizes xml:id as an ID.
> 
> 3. A document sent them by somebody else that uses xml:id
> 
> Their IDs can start moving for no reason that's apparent to them.

  You caracterize as "moving", i.e. generating false positive, which IMHO
is more dangerous than the real scope of the problem.
  As I tried to explain with example the ID are not moving. They get 
duplicated in place where they are not expected , but they are not moved.
As explained already there is 3 cases I can see being used, maybe I wasn't
clear:
 - If the xml:id source does get into the documetn where a part has been
   signed then it is still first in document order and any retrieval 
   algorithm based on that id will still work as expected.
 - If the canonicalized fragment is removed from its context and used
   standalone, you get IDs which were not supposed to exist but which you
   didn't expected to see.
 - If the canonicalized fragment is removed from its context and plugged
   into another XML instance then xml:id or the validation layer should
   report the duplication of IDs when the fragment is merged in.

Maybe you still think it is too dangerous for a security layer, then
the W3C should really deprecate the Canonical XML spec for such purpose
as their stack also move xml:base potentially leading to wrong URI-References
from trusted data which seems to me way more dangerous. And we have
Exclusive Canonicalization which is a REC, and does not suffer from 
this problem.

> Even worse: suppose someone sends them a document that uses xml:id along 
> with a DTD (possibly internal) or schema that specifies that the type of 
> xml:id is ID, as is indeed recommended in the xml:id CR. Now they don't 
> even need to be using software that treats xml:id as special in any way. 
> They could have a tool chain and process in place *today* that is going 
> to fail the first time someone sends them such a document.

   That's true. But again it's not ID "moving", it either being duplicated
or existing while it should not. Their toolchain today as demonstrated can
also lead to wrong URI-References.

> I don't know who it's going to happen to. I don't know how many people 
> this is going to affect. I suspect most people it does affect will not 
> be affected too greatly; but I do think it's going to happen, and 
> because we're mucking with security infrastructure here , it's possible 
> somebody is going to get hurt badly.

   We know their security stack can lead to broken UI-Reference from
part they expect to be digitally signed and hence trusted data, somehow
if you raise the security side of the problem this sounds more dangerous
than the xml:id problem. On one side there is the dislike for the namespaced
identifiers you clearly explained, but we need to separate issues. 
Canonicalization is know to be broken w.r.t. the current XML infrastructure,
let's assess the damages and see how this need to be handled

> Given that the problem exists with today's software that fully conforms 
> to the specs, the only thing we can effectively change to prevent this 
> is the one piece that isn't released yet. That's xml:id.

  and xml:base, which is impossible to change now. The problem exists
independant of xml:id as demonstrated earlier.

> Longer term, I do agree that it's time to deprecate canonical XML and 
> start moving the world to exclusive canonicalization or an alternate, 
> new algorithm that inherits xml:space, xml:lang, and namespaces declared 
> but not anything else (Semi-exclusive canonical XML?) but that's not 
> going to happen soon enough to rescue the current xml:id scheme. Even if 
> we could get a spec out quickly, it will take years to shift the 
> installed base and all the dependent specs.

  IMHO If you are in the security sector and don't update the software when
there is an publicly detailed flaw in the stack you are using or at least
not check that it can't lead to damage in your environment, then you really
are looking for troubles and should go out of business. It's not about
people just not watching xml-dev, it's about just being serious. If people
start introducing xml:id in their security based apps, they should know by
now and especially if we document it in the REC that there is a problem.
And security software build in stone and not monitored is worth nothing
that should be clear to anybody, security is a process !

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
Received on Sunday, 13 February 2005 11:20:18 UTC