Re: Provokant proposal on Exclusive C14n

Hi Christian,

I see what you are proposing but I don't really see why.

Is the justification solely to satisfy what you suggest is the
second purpose of canonicalization (allowing re-parsing)? I
don't think this is a purpose of c14n; the spec says
"The canonical form of an XML document subset may not be
well-formed XML." If the intention of c14n and exc-c14n
had been to guarantee reparsable results, they would have
had express limitations placed on their input. Knowing what
canonicalization is employed, applications can meaningfully
use the input node set without having to reprase.

I agree that unwitting interference with the node set is not
advisable; *however*, I do not think it is without merit. As
such, arbitrarily restricting applications from employing
exc-c14n when they choose to do this does not seem, to me,
defensible.

Worse, silently ignoring the contents of the namespace axis
would seem like a terrible approach because applications,
being familiar with c14n, might not expect this. Were we to
go down this route, I think we should instead state that
an input node set that either excludes an element but includes
part of its namespace axis, or that includes an element but
excludes part of its namespace axis, MUST raise an error.
We would do this because we think it is an error to produce
such node sets.

However; I don't think it is an error and I would be strongly
opposed to any such proposition. C14n and exc-c14n are,
with few exceptions, almost identical, which allows me to
use a single codebase to implement both. Any unnecessary
divergence is just a headache.

As an aside, and returning to what you said a while ago about
multiple application of c14n being idempotent, the spec does
state, in §2.4:
  Whether from a full document or a document subset, if the canonical
  form is well-formed XML, then subsequent applications of the same XML
  canonicalization method to the canonical form make no changes.

This is, of course, not true!

Merlin

r/geuer-pollmann@nue.et-inf.uni-siegen.de/2002.06.05/09:46:50
>
>Hi all,
>
>first a big thank you to Merlin who made the very cool edge-cases for c14n 
>and exclC14n to understand how these standards handle the namespace stuff. 
>Till a few weeks ago, I did not understood that a properly choosen document 
>subset (in c14n) can exclude namespaces from the documents subset. For me, 
>namespaces were not 'regular' nodes but they were inseparable twisted with 
>the document.
>
>For "Canonical XML", I see that the possibility to include only particular 
>namespaces to a document subset is really cool if a transfroms author wants 
>to create context-independent document subsets.
>
>For "Exclusive Canonical XML", I don't see why we have to inherit the 
>(complicated) namespace handling from "Canonical XML".
>
>Provokant proposal: If the PR-Status of exclC14n allows this (substantial) 
>change, I want to propagate to canonicalize document subsets as follows:
>
>  "If a document subset is to be canonicalized using 'Exclusive C14n',
>   all namespace nodes in the original document are included in the
>   document subset prior the serialization process; this inclusion is
>   done regardless whether a namespace node is already in the subset
>   or if it's excluded from the subset."
>
>After that 'pre-processing', the exclusive c14n process is started with the 
>following change: All passages in the text which refer to namespace nodes 
>which are not in the document subset can be omitted.
>
>Why do I suggest that: For standard c14n, it was necessary to be able to 
>omit namespace nodes from the document subset. For exclusive c14n, we have 
>(1) the mechanism of the "InclusiveNamespaces PrefixList" and (2) the 
>visibly-utilizes mechanism. I think that such a change will make exclusive 
>c14n reliable and consistent (not consistent to the c14n REC but consistent 
>to what c14n should really do).
>
>I think canonicalization should serve two purposes:
>
> (1) create a bit-accurate representation of a document
>     or document subset for use in cryptographic algorithms
>     like a message digest
>
> (2) allow the verifier of a signature to take these signed
>     octets and re-parse the octets to get back a
>     "trusted" XML structure which can be reliably used in
>     the application. This goes to "process-what-is-signed".
>     But with the current processing model where namespaces
>     can be excluded from the document subset, it's possible
>     that a "reparse signed contents" step does encounter
>     'illegal' XML.
>
>I had no better word as 'illegal'. I know that it's possible that the 
>signed contents are not well-formed, e.g. like this:
>
>      <A /><B />
>
>or like this
>
>      foo text <A />
>
>but these are problems which can be handled easily by "wrapping" the octets 
>into a dummy root element. But if a namespace is used e.g. by an element 
>but the namespace decl does not appear, this can't be handled in any way, 
>and from the semantics point, it's even completely meaningless:
>
><foo:A>
>   <foo:B xmlns:foo="http://foo" />
></foo:A>
>
>In this case, the namespace is (maybe accidently?) omitted from the foo:A 
>element, but what happens if we have such an input document:
>
><foo:Contract  xmlns:foo="http://companyA.com">
>   <foo:Detail xmlns:foo="http://companyB.com" />
></foo:Contract>
>
>and I choose a rogue document subset which results in
>
><foo:Contract  xmlns:foo="http://companyA.com">
>   <foo:Detail />
></foo:Contract>
>
>That's so bad; I think that the above proposal will stop that kind of 
>cheating: foo:Detail visibly utilizes foo and so 
>xmlns:foo="http://companyB.com" is output in the exclusive canonical form, 
>regardless whether the XPath transform author did include it or not.
>
>
>
>Kind regards,
>hope that you all don't eat me alive for this ;-)
>
>Christian
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Received on Wednesday, 5 June 2002 09:10:45 UTC