RE: Significant W3C Confusion over Namespace Meaning and Policy

I'm a little reluctant to jump into this over long thread, but one or two 
things that John writes puzzle me.  Specfically:

John Boyer writes:

> Consider a processor P for the namespace N=({A,
> B}, http://example.org)

> Now create the distinct namespace N'=({A, B, C},
> http://example.org) and create a processor P' for
> N'.

> Use P' to create a document D that contains markup
> from namespace N', especially including C. Let the
> user sign D with signature S.  Now, send D to
> someone having only P.  Signature S validates,
> then D is presented to the validating user.
> Problem is, P does not understand C, so only A and
> B are presented as being 'what got signed' by the
> original signer.

This surprises me.  As I understand it, a signatures are applied to 
documents not namespaces.  Thus, a dsig created on a document that 
contains elements A, B, C is signing all of those elements, their 
positions, contents, etc. 

One of two things of interest will happen when your sample document is 
sent to (older) processor P.  Either the document is corrupted on the way, 
or it isn't.  If the document is corrupted, the signature will correctly 
warn P of this;  if the document is intact, then the signature will 
correctly confirm that fact to P.  P will then know that the use of the 
suprising "C" construct is not an accident, but was indeed intentional on 
the part of the signer.  Working as designed, right? 

The fact that C is namespace qualified means that the author of 
application C will not be mistaking it for the many other similar names in 
other namespaces.  If he or she wants to follow up, then it's likely that 
the namespace name will give a handle on finding out who owns the 
specification for this new and unexpected C element.  The fact that it's 
in the same namespace suggests that it was in fact created by those who 
invented the expected elements, A and B.  All of this seems just fine to 
me.

> The reason why we cannot change the meaning of
> existing words in a namespace once signatures are
> applied to markup in that namespace is *exactly
> the same* as the reason why words cannot be added
> or deleted.  It's impossible to accept one but not
> the other!

As noted above, it's not the namespace that's being signed, it's instance 
documents using the namespace(s).  I don't think a dsig can protect 
anything other than what is directly represented in the bits being signed. 
 

Even with a signature, you always have all kinds of external assumptions 
that can change out from under you.  You seem to worry that as new words 
are introduced dsigs somehow fail to protect you.  The real concern would 
be if someone rewrote the specification for a vocabulary (not necessarily 
a namespace, since namespaces and vocabularies need not be tightly tied in 
my opinion!) to change the meaning of existing instances.  That's a 
problem with or without DSIGs, and such changes wouldn't be particularly 
tied to the introduction of new words in the vocabulary.   Typically, new 
words are used in new documents; it's changes to the descriptions of the 
old words that break compatibility.   The DSIG can't and shouldn't protect 
you from any incompatible change to specifcations that you're depending 
on.  DSIGs ensure that the instance sent is the one received.  Discipline 
in keeping the specs ensures that old-style instance documents continue to 
mean what they used to mean;  it seems to me that adding new words to a 
namespace is only a problem if it breaks such compatibility.

Regarding our particular c14n problem with xml:id : 

c14n's create equivalence classes of documents; all documents in each 
class get the same signature.  The particular c14n algorithm we've got 
does something unfortunate with xml:id, I.e. it propagates that attribute 
down the subtree to nodes where it manifestly doesn't belong.  What 
specific harm does this do?  I >think< the pertinent example is:

Procesor P prepares and signs instance document I:

        <outer xml:id="IDofElementOuter" xmlns:n="someURI">
          <inner>
        </outer>

A malicious malcontent breaks into the system and substitutes for the 
correct instance a new one:

        <outer xml:id="IDofElementOuter" xmlns:n="someURI">
          <inner xml:id"IDofElementOuter">
        </outer>

Processor P' receives the altered document.

Bad news.  The c14n of these is the same, and the dsig algorithm does not 
do its intended job of catching that the instances have been switched. The 
same would be true, of course, if this malcontent switched the order of 
two attributes, though no doubt the xml:id example is more troubling. 
Indeed, this is indeed not good, but I think that all the trouble is of 
this sort.  Any other breakage to the instance is caught. 

Even in this case, the receiving processor (presumably) sees the document 
actually received, and not the canonicalization.  In the likely case that there is no tampering, the receiver sees the 
original document even though the c14n caused the signature to be over 
something else.   The receiving application sees only the one xml:id on 
outer, unless the documents were indeed substituted.

So, in summary, the downside of the current situation is that those who 
otherwise trust dsigs will have to be robust against the possibility that 
additional xml:id's are maliciously or erroneously introduced between 
sender and receiver.  Not good, and we should fix this with new c14n as 
soon as practical, but not an industry-stopping disaster I would think? 
Maybe I'm confused, but that's how it seems to me.

I agree with and will not repeat the many good arguments made by Tim Bray 
and others as to why mutable namespaces are reasonable in principle, 
working in practice, and indeed widely and successfully deployed.

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Wednesday, 16 February 2005 03:04:19 UTC