Re: XML and canonicalization from Donald E. Eastlake 3rd on 1999-10-27 (w3c-ietf-xmldsig@w3.org from October to December 1999)

From: Donald E. Eastlake 3rd <dee3@torque.pothole.com>
Date: Wed, 27 Oct 1999 11:36:25 -0400
To: <w3c-ietf-xmldsig@w3.org>
Message-Id: <199910271536.LAA06893@torque.pothole.com>
Hi Phil,

From:  "Phillip M Hallam-Baker" <pbaker@verisign.com>
Resent-Date:  Mon, 25 Oct 1999 11:13:55 -0400 (EDT)
Resent-Message-Id:  <199910251513.LAA04220@www19.w3.org>
To:  "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>,
            <w3c-ietf-xmldsig@w3.org>
Date:  Mon, 25 Oct 1999 11:14:55 -0400
Message-ID:  <000d01bf1efb$b1f96780$6e07a8c0@pbaker-pc.verisign.com>
In-Reply-To:  <199910250307.XAA02021@torque.pothole.com>

>We have been over this many times.

Indeed, canonicalization appears to be a contentious area.  I'm sure
there will substantial future discussion on this topic.  :-)

>The ability to reconstruct signatures after a document has been parsed and
>regenerated was a dreadfull idea for ASN.1 leading to the mistake of DER.

The complexity stems from having multiple surface representations of
the same internal data.  Thus, with BER (Basic Encoding Rules) or some
of the other encoding rule options for ASN.1, you can serialize the
internal data in multiple ways and when the signer and verifier do so,
as Murphy's Law predicts, the signture breaks.
	I don't think ASN.1 was desgined with digital signatures in
mind at all but this separation between the internal data and the
surface representation was a deliberate feature.
	If it were practical to always carry along a particular
surface representation, there would never have been any pressure to
design DER (Distinguished Encoding Rules) which provides a unique
external serialization of the true internal ASN.1 data.
	Protocol uses of XML, such as IOTP, are always assembling new
messages out of pieces of old message, some signed and some not.  It
is essential that such things be possible without breaking XML
signatures.
	The same is true of protocols of more than trivial compexity
that use ASN.1.  For example, I have architected a SET implementation
and coded it for some of the SET messages. (So my distaste for ASN.1
is grounded in hands-on experience.) In SET you are always tearing
apart and decrypting and verifying and putting together and signing
and encrypting stuff.  It's kind of tedious but it all works fine as
long as there is a canonical representation defined.

>But for DER, ASN.1 would probably have never gained the reputation it has.

I don't think DER had much effect on the reputation of ASN.1 but to
the extent that it had an effect, it was to improve its utility and
reputation.  The canonical representation of ASN.1 provided by DER
made interoperable digital signatures over ASN.1 data practical for
non-trivial systems just as XML canonicalization is required to make
interoperable digital signatures over XML practical.

>This canonicalization algorithm you keep touting is exactly the type of
>baroque SGML boondogle that made invention of XML necessary in the first
>place.

I don't think so.  There were quite a few motivations for the design
of XML but I don't believe they were particularly worried about
digital signatures during that design process.  As a result, they
chose to provide flexible output formating.  This improves
readability, which is another goal of XML.  As soon as the decision
was made to permit multiple surface representations of the same true
interal data, the requirement for XML canonicalization, to make XML
digital signature practical, was born.
	However, with XML, they went even further.  Not only are
multiple surface representations permitted, but the XML standards
require that some surface details, details which are insignificant to
determining the true interal data represented, must not be passed to
any XML conformant application.

>The chances of the canonicalization algorithm being generally implemented 
>well enough to be relied upon and actually used for the purpose of parsing
>datastructures into databases and reconstructing them are zero. People had
>difficulty with PKCS#12 which is many orders of magnitude simpler. The fact
>that there is an argument going on on this list as to what the c14n form
>would be is a case in point.

I find your use of the word "databases" odd.
	Implementation of DER so as to be interoperable does not seem
to have been a big problem.
	I think there will be a number of interoperable open source
implementations of XML canonicalization so I don't see a problem there
either.  The current W3C canonicalization form would work, as far as I
can tell.
	There are some discussions in this WG of details or options
for canonicalization.  But most of the arguments seem to be pretty
much on whether you should canonicalize at all.  I think a lot of this
is the conflict between the document view, which doesn't want to touch
any bits, and the protocol view which believes that substantial
intermediate processing of signed data is the norm.

>The processing overhead required to reassemble the attributes in order is
>significant, particularly when the object in question is a complex data 
>structure generated from an XML schema, possibly incorporating several Mb
>of internal attributes.

I do not agree.
	The number of attributes on an element is usually quite small.
But even if there were, say, 100 attributes on an element, which is
more than I have ever seen, the effort involved in treating them in a
canonical order is trivial.  I understand that Jim Clark's XML parser
always gives you the attributes in order.  The standard XML
interfaces, DOM and SAX, parse start tags and the attributes therein
and DOM gives them to you a leaves of the tree it provides while SAX
gives you an arrary of attributes as part of the start tag event. I
don't quite know what you are refering with with "several Mb of
internal attributes" but I don't see any relevance of the size of the
attribute values to the trivial effort of ordering them before you
serialize them to feed to your digest function.
	It is assumed in XML that the start tag and its attributes are
more or less one atom that you can hold and process.  Large, complex
data is normally content, not attribute value.  2 or 3 attributes is
very much more likely than 20 or 30.  Attribute values of a few tens
of characters or less are very VERY much more likely than a megabyte
attribute value, but, as I say, the length of the attribute value has
almost nothing to do with the trivial effort of attribute ordering.

>Systems are already being built based on the PKCS#7 detached signature
>format. If you insist on re-opening this issue yet again those systems
>will cease to be prototypes and be embedded production systems by the
>time the specification is complete.

Why do you believe this issue was ever closed?
	This is not a working group to rubber stamp the binary PKCS#7
format and declare it to be the XML signature format.  Nor is it a
group to design a new binary signature format that happens to be a
character encoded binary format which looks like XML.
	The charter of this working group is to specify an actual XML
syntax signature format.  One consonant with the established XML
standards.  And it wouldn't hurt if it fit with the existing standard
DOM interface and the de facto standard SAX interface either.  These
standards and interfaces make, for example, original attribute
ordering and insignificant white space inside start tags not just
insignificant but completely inaccessible to any XML application.
		
>Needless to say the chances of someone taking a working production system
>and then introducing an unnecessary transformation to sort all the XML
>attributes into alphabetical order before verifying a signature is small.

I suppose there may be proprietary closed production systems that
don't use any standard DOM or SAX derived interface and don't care
about the XML standards.  In my opinion, these are binary or text
signature systems, not XML signature systems.  If that works for them
and they are happy, that's fine and perhaps they won't change.  This
doesn't seem to have much to with specifying an XML sytax signature
standard, however.

>If you really MUST support the functionality proposed, canonicalization 
>(which has a well defined meaning in computer science) is NOT the way
>to achieve it.
>
>Instead of canonicalization, specify a syntax constraint. This might 
>state for example that the signed octets presented were presented with
>null space eliminated, tags fully epanded and attributes sorted into
>alphabetical order.

Yes, you could canonicalize at "print" time rather than at signature
and at verification time, throwing away the readability and
flexibility of XML.  Of course, nothing that passes the result or
verifies the signature could use SAX or DOM unless it was willing to
re-canonicalize it (in which case it is unclear what you gained by
canonicalizing at print time).  They would have to use proprietary
home-brew processing.

>The only functional difference between this approach and the one you 
>propose is that the signature on the syntax constrained text would fail
>if it were to be passed through this 'noisy channel' that you keep
>claiming exists (although I have never known a noisy channel 
>gratuitously re-order bytes).

"Noisy channel" is not a particular good description.  I claim that we
should assume that much of the time XML will be processing in
accordance with the XML 1.0 standard and other standards and that much
of the time the standard DOM and de facto standard SAX interfaces will
be used.  I think there is substantial evidence beyond my personal
claim for the existence and use of SAX, the existence and use of DOM,
and the existence and use of the adopted XML standards.

>On the processing side such a syntax constraint would require minimal
>code, a few lines to check that each attribute was presented in order,
>no unexpected whitespace appearing etc.
>
>
>A syntax constraint is acceptable, a c14n requirement is not.

The charter of this WG is to specify a digital signature in XML
syntax.  The standards for XML are such that, for anything even
vaguely like our current XML digital signature syntax,
XML canonicalization is required.

>If we continue to keep re-opening this issue I don't think consensus 
>will be reached. Worse still, I predict that as with ASN.1 in PKIX 
>there will be a steady stream of attempts to re-open the syntax issue
>and 'simplify', generally leading to entirely different syntaxes.

I haven't the faintest idea why, in such a contentious area, you think
this issue was ever closed in this WG.  I would think that if we can
not get to consensus on a fixed canonicalization algorithm for
SignedInfo and can not even get to consensus on a default
canonicalization algorithm for SignedInfo, it will just end up a
mandatory to specify algorithm.  And everyone actually doing XML
processing as defined by the XML standards and implemented with DOM or
SAX will specify an XML canonicalization, such as the current W3C
canonicalization draft.
	I don't know what's going on in PKIX but in any area of active
standards development there are normally a constant series of
suggestions presented.  I can't tell if you are claiming that adopting
an XML canonicalization algorithm would lead to suggestions for change
in the syntax of XML or suggestions for change in the syntax of XML
digital signatures, but I don't see how either follows.

>		Phill

I'm sorry that you don't like XML and wish it were a different animal
than it is.  But inherent in the adopted XML standards and in the
standard DOM and SAX interfaces that are used to implement those
standards are both the screening from any XML application of some
insignificant aspects of XML input and the declaration that other
aspects are insignificant.
	There is no problem with providing implementers with
pre-defined options for using the XML digital signature format for
more fragile non-XML signatures and the option of defining their own
canonicalizations/transforms.  But, in my opinion, the absence of a
required to implement XML canonicalziation would mean the WG has
failed in its charter duty to specify interoperable XML syntax
signatures.

Donald
===================================================================
 Donald E. Eastlake 3rd   +1 914-276-2668   dee3@torque.pothole.com
 65 Shindegan Hill Rd, RR#1  +1 914-784-7913(w)     dee3@us.ibm.com
 Carmel, NY 10512 USA
Received on Wednesday, 27 October 1999 11:36:31 UTC