RE: c14n from John Boyer on 2000-05-23 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: John Boyer <jboyer@PureEdge.com>
Date: Tue, 23 May 2000 10:33:46 -0700
To: "Alex Milowski" <alex@milowski.com>
Cc: "\"IETF/W3C XML-DSig WG\"" <w3c-ietf-xmldsig@w3.org>
Message-ID: <BFEDKCINEPLBDLODCODKOEOGCCAA.jboyer@PureEdge.com>
To all: Joseph asked that we post this conversation to the list.  It
contains passages marked <john> which are from my original letter, passages
marked with <alex> are Alex Milowski's responses, and passages marked with
<johnresponse> are my responses.

By the way, Alex and I will be trying to work out the details of c14n so
that we can continue to use it within dsig.

John Boyer
Software Development Manager
PureEdge Solutions Inc. (formerly UWI.Com)
Creating Binding E-Commerce
jboyer@PureEdge.com


<john>
> Hi Alex,
>
> Joseph Reagle wants a draft by May 25.  In that draft, I currently plan to
> not change much except resolve what appear to be the two outstanding
> differences between XPath serialization and c14n.
>
> My position for both of these resolutions stems from the belief that we
are
> trying to create a c14n for XML 1.0.  Two XML documents may be logically
> equivalent under some additional set of considerations, but canonicalizing
> documents under those circumstances should require a separate algorithm.
>
> Particularly under the requirements of digital signatures,
canonicalization
> helps ensure that XML applications don't corrupt a signature by doing
> logically equivalent things, but there is only so far we can take this.
To
> account for most existing applications, it should be sufficient to
> canonicalize XML 1.0 and have applications not make changes to documents
> based on logical equivalencies under considerations beyond the scope of
the
> XML 1.0 specification.
</john>

<alex>
This position will not work if you want to entertain logical equivalance
for XML namespaces and XML Schema aware documents.
</alex>

<johnresponse>
Yes, basically what I'm saying is that we should limit our exposure as much
as possible to canonicalizing XML 1.0.  Further canonicalization to account
for these other issues can be specified in the future.  Schema is not a
recommendation yet, and unless namespaces are change in some radical way, it
is impossible to canonicalize namespaces to the point where two documents
are equal if and only if their byte streams are equal.  The reason is that
XPath expressions bearing namespace references can appear in attribute
values (XSLT) or element content (DSIG).  Current c14n rewrites namespaces,
but it cannot know about references to namespaces appearing in XPath
expressions, so the two documents will still not be logically equivalent
despite namespace rewriting.  Furthermore, the rewriting actually breaks the
XPath expressions in the document.
</johnresponse>

<john>
> 1) Eliminate character model normalization. XML 1.0 is based on UCS.  An
> application should not change the UCS code point values used to represent
> characters.  Although differing UCS code point sequences may result in the
> same character visually, the knowledge of this equivalence resides outside
> of the XML 1.0 specification.  Further, those who actually have the
problem
> are likely to (and should) normalize characters during document creation.
</john>

<alex>
I think this is fine.
</alex>

<john>
> 2) Eliminate namespace prefix rewriting.  Namespaces are the most
difficult
> problem.  We will continue to propogate namespace context to descendants
> because
>
> a) namespace information would otherwise be lost when dealing with
fragments
> b) XPath and other tools have been specified to work in this way
</john>

<alex>
We can't do this.  Namespace information is *not* lost by re-writing the
prefix.  The prefix is meaningless for equivalance of namespaces.

XPath operates on namespace equivalance--the URI and local name--and not
the prefix.  Thus, when you rewrite the prefix, the same xpath expression
will work.

...unless you are speaking to an Xpath specified in the document to which
C14N is being aplied to...  That is an interesting problem not solved by
C14N.
</alex>

<johnresponse>
Yes, this is one example of the problem I'm talking about.  C14N doesn't
solve it.  Furthermore, since C14N would also do irreparable damage to such
documents if we include namespace rewriting, it should not be included.
Namespaces need to be reformulated before we can canonicalize them without
damaging the document.

The details of what I mean by 'damage' the document are provided below in
response to another comment.
</johnresponse>

<alex>
Careful orchestration of the XPath can solve this problem.

For example:

<a:something xmlns:a="urn:x-lexica:something:en" xpath="a:foo">
<a:foo/>
</a:something>

would have problems after C14N while

<something xmlns="urn:x-lexica:something:en" xpath="foo">
<foo/>
</something>

would not because the default namespace is declared.

</alex>

<johnresponse>
What you're saying is that we shouldn't write documents that would break
c14n's ability to establish logical equivalence.  Here's another idea:
don't use different namespace prefixes in documents that are supposed to be
logically equivalent.
</johnresponse>

<john>
> However, rewriting of the namespace prefixes should not be performed.
> Namespace normalization was designed to be part of a scheme whereby the
> logical equivalence of two XML documents could be established by a
> byte-level comparison of the resulting c14n documents.  Unfortunately,
> namespace prefix rewriting does not achieve this end.  If a reference to a
> namespace appears in an attribute value or as element content (e.g. an
XPath
> in an XSL template), then it will not get normalized.  Therefore, two
> documents with one using ns1 as a prefix and the other using ns2 as a
prefix
> (both with the same URI assignment) are in fact logically equivalent but a
> byte comparison would not suffice if ns1 and ns2 appeared in attribute
> values or element character content.
>
> Most importantly, the definition of canonicalization is contradicted if
the
> c14n algorithm changes the meaning of the document.  With namespace
> rewriting, c14n(D) == c14n(c14n(D)), yet D is not logically equivalent to
> c14n(D).
>
> This is essentially a proof that it is not possible to perform byte level
> equivalence testing of namespace enhanced XML.  Therefore, we should stick
> to XML 1.0 and not try to normalize namespaces.
</john>

<alex>
I'm just not following this logic.
</alex>

<johnresponse>
You just did one of these in the first part of your example above.  Perhaps
these theorems will clarify the matter.

Theorem 1: With namespace rewriting, there exist two XML documents D1 and D2
that are logically equivalent yet c14n(D1)!=c14n(D2).

Proof: Let D1 be a document containing an XPath in an attribute value or
element content that refers to namespace prefixes used in D1.  Further
assume that the namespace prefixes in D1 will all be rewritten by c14n.  Let
D2 = D1, then modify the namespace prefixes in D2 and modify the XPath
expression's references to namespace prefixes such that D2 and D1 remain
logically equivalent.  Since c14n namespace rewriting does not include
occurences of namespace references in attribute values and element content,
c14n(D1) != c14n(D2). []

Remark: The same condition exists if we remove namespace rewriting. The
purpose of this theorem is simply to show that namespace rewriting does not
accomplish the goal for which it is intended.

Theorem 2: With namespace rewriting, there exist two XML documents D1 and D2
that have equivalent canonicalizations and yet are not logically equivalent.

Proof: Let D1 be a document containing an XPath in an attribute value or
element content that refers to namespace prefixes used in D1.  Further
assume that the namespace prefixes in D1 will all be rewritten by c14n.  Now
let D2=c14n(D1).  Clearly, c14n(D1)==c14n(D2) yet D1 and D2 are not
logically equivalent because the aforementioned XPath works in D1 and
doesn't work in D2. []

Remark: This is what I mean when I say that c14n namespace rewriting can
'damage' the document and is proof that namespace rewriting is harmful
rather than just ineffective.
</johnresponse>

<alex>
The namespace prefix rewriting in C14N
was put there to allow logical equivalance.  If you do not rewrite the
prefixes, you will never have logical equivalance.
</alex>

<johnresponse>
Yes, we will be able to establish the logical equivalence of two documents
whose namespace prefixes have not been altered.  For the purposes of
signatures this is certainly sufficient, and as I've established, you really
can't do better in the general sense.
</johnresponse>


For example:

<a:something xmlns:a="urn:x-lexica:something:en"/>

is equivalence to:

<b:something xmlns:b="urn:x-lexica:something:en"/>

and

<something xmlns="urn:x-lexica:something:en"/>

Is it necessary that XPath values that we depend on are inside what is
being signed?

<johnresponse>
Yes, the XPath transform absolutely, positively must be signed.

However, that really isn't the issue.  The issue is that whether its DSig or
any other application that might use XPaths (e.g. to express computational
relationships between elements in an electronic form), c14n cannot guarantee
equivalence with namespace rewriting.  Since I've further established that
namespace rewriting will damage these documents, it must be omitted.
</johnresponse>

<alex>
Another way to solve this is to force prefix normalization before
signatures if you want certain XPaths to work.  It isn't a great way to
solve this problem, but it certainly works.
</alex>

<johnresponse>
As I said before, it's no better a solution, has been shown to be worse.
Further, it will be worse for applications beyond DSig, such as the
expression of computational relationships.
</johnresponse>

<john>
> ===============================
>
> A third difference between XPath serialization and c14n is that the latter
> serializes an arbitrary fragment of XML.  I think we can address
fragmentary
> XML only by defining c14n input to be an XPath node-set.  Anything less,
and
> we would not have sufficient information to create the output.  However, I
> do not think there will be time to address this in the May 25 document.
</john>

<alex>
I don't really see this as an issue.  C14N specifies an algorithm for the
whole document.  There is an obvious way to scope it to a certain element.
</alex>

<johnresponse>
There are those who want to deal with fragments that have may have finer
granularity, and those that have holes. A great example would be a merge
between XPath serialization and XML canonicalization.  However, the only
real way I see of doing this is to provide the partial document to c14n as
an XPath node-set, not a byte stream.

John Boyer
Software Development Manager
PureEdge Solutions Inc. (formerly UWI.Com)
Creating Binding E-Commerce
jboyer@PureEdge.com

</johnresponse>

R. Alexander Milowski           FAX: (707) 598-7649
alex@milowski.com

"The excellence of grammar as a guide is proportional to the paucity of the
 inflexions, i.e. to the degree of analysis effected by the language
 considered."

 Bertrand Russell in a footnote of Principles of Mathematics
Received on Tuesday, 23 May 2000 13:33:57 UTC