Caononicalization Re: Minutes from Today's Call Please Review/Correct from Donald E. Eastlake 3rd on 1999-08-25 (w3c-ietf-xmldsig@w3.org from July to September 1999)

From: Donald E. Eastlake 3rd <dee3@torque.pothole.com>
Date: Wed, 25 Aug 1999 11:56:14 -0400
To: "IETF/W3C XML-DSig WG" <w3c-ietf-xmldsig@w3.org>
Message-Id: <199908251556.LAA01878@torque.pothole.com>
From:  "Phillip M Hallam-Baker" <pbaker@verisign.com>
To:  "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
Cc:  "IETF/W3C XML-DSig WG" <w3c-ietf-xmldsig@w3.org>
Date:  Tue, 24 Aug 1999 11:48:35 -0400
Message-ID:  <002201beee48$1fda2a80$6e07a8c0@pbaker-pc.verisign.com>
In-Reply-To:  <199908231452.KAA15193@torque.pothole.com>
X-Mailing-List:  <w3c-ietf-xmldsig@w3.org> archive/latest/306

>> >The only argument advanced in favour of C14N has been that some folks 
>> >want transmission over channels which introduce noise. On the minus
>> 
>> That is only true for a very, VERY broad stretched definition of
>> "noisy" channel.  Is a 7 bit channel which requires a reversable
>> re-encoding into UTF-7 noisy? 
>
>Yes, but that transformation does not require C14N, in fact as I explain
>bellow c14N cannot be used in this circumstance.

My apologies for, evidently, not being clear.  XML is encoded in a
variety of ways such as UTF-16, UTF-7, etc.  I believe that a common
element of naturally occuring transformations on XML in applications
will be re-encoding between these, for example, receiving XNL in
UTF-16 and encoding it in UTF-7 for transmission or storage or vice
versa.  Since signatures are computed over absolute bit(/octet)
streams, it is necessary to use the encoding that was signed in order
to verify a signature.  In such cases, either you mandate the
availability of extra information as to the encoding that was used
when the signature was generated or you specify a canonicalization
algorithm that includes encoding into one specific encoding, say
UTF-8.

>> Is a system which parses messages into
>> DOM trees, re-assebmles various parts of them including some
>> signatures into a new message and signs parts of that a "noisy
>> channel"?
>
>No, it is noisy AND lossy.

I suppose that is true from the point of view of the entire original
set of messages considered as binary objects. But as long as the
signatures and the signed pieces for which verification is significant
are faithfully preserved from the application viewpoint, there is no
noise or loss effecting security.

>I don't accept that support for either channel should be mandatory. I 
>don't believe that supporting the second channel is safe.

It is a requirement of XMLDSIG that it support IOTP which is the
second case.

>I certainly don't accept that sympathy for the first scenario should
>be used to justify support for the second.

There are a broad spectrum of artifacts and changes in different
applications and environments.  I picked extreme cases, one very
simple and one very complex.  I couild fill in lots of intermediate
ones.  The point was that many XML applications require
canonicalization and the model of the world-to-be-signed consisting of
immutable binary chunks flowing though absoutley clean processing,
storage, and transmission systems does not work for them.

>> As far as I can see, cannonicalization is absolutely essential for a
>> vast range of applications of XMLDSIG,
>
>And how does essential for some applications translate into a MANDATORY
>requirement?

The motivation for MANDATORY is interoperability.  If almost all the
XML world follows the immutable binary object view, which negates most
of the flexibility benefits of XNL, then it is not worth the effort to
make implementation of a canonicalization algorithm mandatory.  But I
do not believe that is the XML world.  I think that, in terms of
implementions, most will need canonicalization and if a non-null
canonicalization algorithm can be found that safely meets the
requriements of most, perhaps including requirements for manifest
canonicalization, it makes sense to require its implementation for
some levels of compliance with the standard.

>> So?  This has nothing to do with canonicalization.  Why are you
>> confusing the issue with deliberate FUD?
>
>Throwing insults arround and leveling personal attacks at others 
>does nothing for your argument, Don. 

My apologies for the insult.  Glad that your agree that your
statement, which you have ommited and I paste immeidately below, has
nothing to do with canonicalizaton.

++>There has never been a requirement that a particular signed object
++>have a unique signature under a particular private key. If so the
++>DSA would fail since signing a document twice is guaranteed to give
++>different results in all but 1 time out of 2^128 attempts.

I'll assume your inclusion of the above, since it is irrelevant, was
accidental.

>The definition of canonicalization is a function f(x) such that
>f(x) = f(f(x)).

No.  That's the fix point property, which is a desireable property of
canonicalization but by no means its definition.  My definition of
canonicalization is a function f(x) which is useful for application A
if f(x1) = f(x2) implies that application A considers x1 and x2
semantically identical.  For example, if appication A uses DOM, then
it will be indifferent to attribute ordering and many other artifacts.
For such an application, the canonicalization part of the original
DOMHASH is a good canonicalization algorithm even though it does NOT
have the fix point property.  But fix point is a good property to have
and the canonicalization part of modified DOMHASH, which does have the
fixed point property, is a better canonicalization algorithm for this
case.

Canonicalization is pretty meaningless in the absence of an
application or family of applications.  For example, for DNS names,
canonicalization is defined in RFC 2535, Section 8, to include
smashing all upper case letters to lower case.  Given that DNS is
defined to treat upper and lower case letters the same and that these
names are frequently typed in by humans or retrieved from legacy data
bases with odd ideas of captialization, smashing letters to a
specified case seems like the right think to do.  But it obviously
isn't if you wanted to canonicalize, say, C code or anything which
is case sensitive.

If you were designing DNS from scratch, it is possible you could get
ride of the requirement for canonicalization.  Similarly, if we were
redesigning XML from scratch, we could toss the ability to use
different encodings, toss the <tag /> alternate form for null content,
etc., etc., and perhaps produce "straight-jacketed XML" that was
completely inflexible in terms of expression and did not require
canonicalization.  But we are more or less given XML and, in general,
will need canonicalization to usefully sign XML objects for most
applications.

>The requirements of DOM hash are precisely to specify a canonical 
>form. What I and others are arguing is that these requirements have
>NOTHING to do with transport over channels which corrupt data.

DOM may specify a canoncial form but that form is an internal
structure with a certain API, not XML.  We are talking about varieties
of canonical XML here, not arbitrary canonical forms.

Your use of the word "corrupt" is wrong and your restriction to
"channels" is misleading.  Corrupt implies damage.  There is no damage
for an application when insignificant artifacts are normalized.  This
is what a cononicalization algorithm appropriate for the application
does.  For the vast majority of XML applications, but not all,
particular artifacts of XML expression that are specified to be
insignificant in the definition of XML will be insignificant.
Thinking only of "channels" as protocol transport from A to B, and
thinking of changes like noise hits on a phone line sending serial
data without a checksum, is a very impoverished point of view.  I
think the normal case in the XML world will be a desire for signatures
that survive encoding changes, reformating of line breaks and white
space, parsing and re-constitution, etc., and multiple of these in
sequetially.

>If you have a channel c(x) which only transmits 7 bit characters
>then the TYPE of x is 7bit, the type of c(x) is 7bit-> 7bit.
>
>In order to use such a channel with an 8bit signature algorithm you
>need a mapping function m(x) whose type will be 7bit -> 8bit.
>
>m(x) cannot POSSIBLY be a canonicalization algorithm since m(x)
>cannot be applied to itself without a type violation.

What I was assuming is an XML receiving process RP whose input
channel, for example, really does "corrupt" the stream if the top bit
is not zero and possibly also if you send certain "control
characters".  It is an 8 bit channel but weird things happen if you
try to send binary over it.  A sending process SP transmitting to it
must use XML encoded in UTF-7, UTF-5, or the like if it wants to get
its message through.  But the argument works just as well if you
simply assume, for whatever reason, that RP expects and can receive
only one encoding such as UTF-8 without assuming any quirks of input
stream handling.  If the XML is accompanied by one or more signatues
that were calculated using UTF-16, say, which may have been calculated
by a third (or nine hundred and ninety ninth) process 3P/999P which is
the only one that has the private key, it can not be re-signed based
on the UTF-7 encoding which must be used to send to RP.  These
signatures will then be useless in the absence of encoding
canonicalization.  And you can't necessarily rejigger the system to do
everything in UTF-7, because you don't have control of all the pieces.
In IOTP for example, there can be multiple RPs run be independent
entities with whom you contract for service.

I'm certainly not saying that all applications signing XML objects
must use a canonicalization that includes encoding canonicalization,
it's probably only 99% of them that will need it.

>I will accept that it is sensible to choose a syntax for signed XML
>that is robust in the presence of 7bit channels. That is a syntax
>issue however and not one of transformation. 

The availability of multiple encodings is part of the definition of
XML, just as the insignificance of attribute orderings, etc., are.
Particlar applications can certainly constrain such things and thus
throw away some of the benefits of XML, presumably in return for
efficiency and simplicity.  But general signatures over XML objects
being used in the geneal XML world can not, in my opinion, assume such
crippling of XML.

>> The requirement, of course, is that you need to be able to verify the
>> signature on an object.  Othewise its useless.  
>
>It is a stronger requirement to be able to detect modification of the
>data structure that is signed.

I wouldn't say it was a general requirement to be able to detect
modification of the signed data structure.  It is a general
requirement to be able to detect modification of the application
object being sent.  True, this means any modification whatsoever in
the case that object is a binary chunk, but in that case I'm not sure
it deserves the appellation "data strucutre" and in this case it
doesn't matter if it is XML or not, since from the point of view of
canonicalzation (null in this case) and signing, it doesn't have much
structure.

>Nobody is arguing against support for c14n FOR THOSE APPLICATIONS WHICH
>NEED IT.
>
>		Phill

You appear to live in a world of immutable binary objects for which it
is purely accidental and of no consequence that some are in XML.  I
just don't think most of the XML world is like that.

Donald
Received on Wednesday, 25 August 1999 11:56:30 UTC