Re: Provokant proposal on Exclusive C14n from Christian Geuer-Pollmann on 2002-06-06 (w3c-ietf-xmldsig@w3.org from April to June 2002)

From: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
Date: Thu, 06 Jun 2002 11:10:43 +0200
To: merlin <merlin@baltimore.ie>
cc: w3c-ietf-xmldsig@w3.org
Message-ID: <8972581.1023361843@localhost>
Hi Merlin,

the main reason why I propose this is re-parsing. For my understanding, 
it's absolutely necessary to be able to re-parse signed contents. I agree 
that the signed (digested) octets are not necessarily well-formed, but the 
should be balanced. If the digested octets are wrapped by

<wrap>put digested octets here</wrap>

, it should be able to parse that. Now I learned that the c14nization of a 
single namespace node like

 xmlns:foo="http://foo"

doesn't really help, because using the above, that would result in

<wrap> xmlns:foo="http://foo"</wrap>

instead of

<wrap xmlns:foo="http://foo"></wrap>

. OK. But the real question is WHY I think that re-parsing is important: We 
talk about digital signatures. A digital signature means that the signer 
makes a statement about something and this statement is signed. The 
transforms allow to select what exactly the statement is, e.g. a single 
paragraph in a contract or the serialization of an object, which is to be 
turned back to life on the verifier side. If the signer can construct a 
transform which destroys the real context (e.g. by omitting relevant 
namespace nodes), this looks very bad.

Let's take the example of a serialized object: How can the verifier bring 
this object back to life and check it's integrity? The verifier receives an 
XML instance, e.g. via SOAP. This instance contains some kind of envelope, 
a digital signature and the object. The envelope (the namespaces therein) 
bleed into the signature and the object. Maybe an adversary on the wire 
even added some attributes to the object. This means that the envelope 
changed the object, maybe some additional processing changed the object, 
etc. If the receiver parses the XML instance, he searches for the signature 
and the signature is valid.

But how does he get an XML representation to construct his object from? He 
can't simply say //my:object[1] and take that Element as input for his 
constructor, because he doesn't know what exactly has been signed. So he 
has three options:

 1.) Inspect the signature: Take the reference, check that
     the @URI and all <ds:Transform>s are exactly like the
     ones which have been pre-defined by some security
     architects. If the signature has the specified form,
     then probably the constructor can use the (untrusted
     and maybe modified) my:object element, because he only
     uses nodes from that subtree which he knows are signed
     because of the checked form of the signature.

     -> Sounds really complicated. For each object, it must
        be exactly defined how a signature MUST look like,
        which Transforms, which XPathes etc.

 2.) Inspect the node set prior c14n:
     To know which nodes from the input document can be trusted
     because they are signed, query the signature object about
     the finally node set which has been c14nized. Using this
     Node set, construct a form which can be read by the
     constructor for the object.

     -> Sounds also complicated.

 3.) Re-parse the digested octets:
     The verifier simply asks the Reference "which octets were
     the input for the digest", i.e. what has been signed. Now
     simply parse these octets (maybe after a wrap), and feed
     the new DOM tree or SAX sequence into the constructor for
     your object.

     -> Safe and easy. But -- it only works if the digested
        octets can be parsed.

But given the example between these both instances:

<foo:Contract  xmlns:foo="http://companyA.com">
  <foo:Detail xmlns:foo="http://companyB.com" />
</foo:Contract>

and

<foo:Contract  xmlns:foo="http://companyA.com">
  <foo:Detail />
</foo:Contract>

Imagine you are questioned by a lawyer on legal issues what has been the 
intent of the signer in the second form, if the first one was the input 
document.

   Automagically including ALL namespaces into the document
   subset prior exclusive c14n makes all this fuzz obsolete.
   Prefixes are automagically in-scope with the correct
   namespaces, digested octet sequences are parseable
   in some way (well-formed or balanced), and so on.....

That's why I think c14n is (or should be) more than only a unique 
representation.

Regards,
Christian


--On Mittwoch, 5. Juni 2002 14:10 +0100 merlin <merlin@baltimore.ie> wrote:

>
>
> Hi Christian,
>
> I see what you are proposing but I don't really see why.
>
> Is the justification solely to satisfy what you suggest is the
> second purpose of canonicalization (allowing re-parsing)? I
> don't think this is a purpose of c14n; the spec says
> "The canonical form of an XML document subset may not be
> well-formed XML." If the intention of c14n and exc-c14n
> had been to guarantee reparsable results, they would have
> had express limitations placed on their input. Knowing what
> canonicalization is employed, applications can meaningfully
> use the input node set without having to reprase.
>
> I agree that unwitting interference with the node set is not
> advisable; *however*, I do not think it is without merit. As
> such, arbitrarily restricting applications from employing
> exc-c14n when they choose to do this does not seem, to me,
> defensible.
>
> Worse, silently ignoring the contents of the namespace axis
> would seem like a terrible approach because applications,
> being familiar with c14n, might not expect this. Were we to
> go down this route, I think we should instead state that
> an input node set that either excludes an element but includes
> part of its namespace axis, or that includes an element but
> excludes part of its namespace axis, MUST raise an error.
> We would do this because we think it is an error to produce
> such node sets.
>
> However; I don't think it is an error and I would be strongly
> opposed to any such proposition. C14n and exc-c14n are,
> with few exceptions, almost identical, which allows me to
> use a single codebase to implement both. Any unnecessary
> divergence is just a headache.
>
> As an aside, and returning to what you said a while ago about
> multiple application of c14n being idempotent, the spec does
> state, in §2.4:
>   Whether from a full document or a document subset, if the canonical
>   form is well-formed XML, then subsequent applications of the same XML
>   canonicalization method to the canonical form make no changes.
>
> This is, of course, not true!
>
> Merlin
>
> r/geuer-pollmann@nue.et-inf.uni-siegen.de/2002.06.05/09:46:50
>>
>> Hi all,
>>
>> first a big thank you to Merlin who made the very cool edge-cases for
>> c14n  and exclC14n to understand how these standards handle the
>> namespace stuff.  Till a few weeks ago, I did not understood that a
>> properly choosen document  subset (in c14n) can exclude namespaces from
>> the documents subset. For me,  namespaces were not 'regular' nodes but
>> they were inseparable twisted with  the document.
>>
>> For "Canonical XML", I see that the possibility to include only
>> particular  namespaces to a document subset is really cool if a
>> transfroms author wants  to create context-independent document subsets.
>>
>> For "Exclusive Canonical XML", I don't see why we have to inherit the
>> (complicated) namespace handling from "Canonical XML".
>>
>> Provokant proposal: If the PR-Status of exclC14n allows this
>> (substantial)  change, I want to propagate to canonicalize document
>> subsets as follows:
>>
>>  "If a document subset is to be canonicalized using 'Exclusive C14n',
>>   all namespace nodes in the original document are included in the
>>   document subset prior the serialization process; this inclusion is
>>   done regardless whether a namespace node is already in the subset
>>   or if it's excluded from the subset."
>>
>> After that 'pre-processing', the exclusive c14n process is started with
>> the  following change: All passages in the text which refer to namespace
>> nodes  which are not in the document subset can be omitted.
>>
>> Why do I suggest that: For standard c14n, it was necessary to be able to
>> omit namespace nodes from the document subset. For exclusive c14n, we
>> have  (1) the mechanism of the "InclusiveNamespaces PrefixList" and (2)
>> the  visibly-utilizes mechanism. I think that such a change will make
>> exclusive  c14n reliable and consistent (not consistent to the c14n REC
>> but consistent  to what c14n should really do).
>>
>> I think canonicalization should serve two purposes:
>>
>> (1) create a bit-accurate representation of a document
>>     or document subset for use in cryptographic algorithms
>>     like a message digest
>>
>> (2) allow the verifier of a signature to take these signed
>>     octets and re-parse the octets to get back a
>>     "trusted" XML structure which can be reliably used in
>>     the application. This goes to "process-what-is-signed".
>>     But with the current processing model where namespaces
>>     can be excluded from the document subset, it's possible
>>     that a "reparse signed contents" step does encounter
>>     'illegal' XML.
>>
>> I had no better word as 'illegal'. I know that it's possible that the
>> signed contents are not well-formed, e.g. like this:
>>
>>      <A /><B />
>>
>> or like this
>>
>>      foo text <A />
>>
>> but these are problems which can be handled easily by "wrapping" the
>> octets  into a dummy root element. But if a namespace is used e.g. by an
>> element  but the namespace decl does not appear, this can't be handled
>> in any way,  and from the semantics point, it's even completely
>> meaningless:
>>
>> <foo:A>
>>   <foo:B xmlns:foo="http://foo" />
>> </foo:A>
>>
>> In this case, the namespace is (maybe accidently?) omitted from the
>> foo:A  element, but what happens if we have such an input document:
>>
>> <foo:Contract  xmlns:foo="http://companyA.com">
>>   <foo:Detail xmlns:foo="http://companyB.com" />
>> </foo:Contract>
>>
>> and I choose a rogue document subset which results in
>>
>> <foo:Contract  xmlns:foo="http://companyA.com">
>>   <foo:Detail />
>> </foo:Contract>
>>
>> That's so bad; I think that the above proposal will stop that kind of
>> cheating: foo:Detail visibly utilizes foo and so
>> xmlns:foo="http://companyB.com" is output in the exclusive canonical
>> form,  regardless whether the XPath transform author did include it or
>> not.
>>
>>
>>
>> Kind regards,
>> hope that you all don't eat me alive for this ;-)
>>
>> Christian
Received on Thursday, 6 June 2002 05:04:59 UTC