- From: Donald E. Eastlake 3rd <dee3@torque.pothole.com>
- Date: Sun, 24 Oct 1999 23:07:47 -0400
- To: w3c-ietf-xmldsig@w3.org
I've been mulling this over and studying the standards starting with the basic XML 1.0 standard. Probably lots of members of this WG are very familiar with XML processing but perhaps what I say below will be helpful for others... No canonicalization makes sense for binary things. Binary things, like images or executables, can reasonable be expected to be truly fixed. The Minimal canonicalization we are defining (canonicalize character set and line endings) makes sense for text. Text comes in a variety of character sets and not uncommonly gets its line endings changed from platform to platform. Of course, if something is handled as binary, even though its text, you can avoid canonicalization. But if it is going to be interoperably processed as text, you would want at least somthing like Minimal canonicalization. The basic process of reading XML and presenting it to an application (whether from an external file or a buffer in memory) is herent in any XML processing and is destructive as far as information goes that XML considers insignificant. Very explicitly, attribute value white space is normalized (unless the attribute is declared as CDATA). In particular, all leading and trailing white space is stripped from attribute values and all internal runs of white space are converted to a single space. While I found it explicitly anywhere, XML experts seem to take it as axiomatic that attribute ordering is insignificant and that white space between items inside start/end tags is insignificant. The XML Infoset says that a CR-LF is converted to an LF as is a CR not followed by an LF. There are additional areas where significance gets more murky, like white space between elements, which in XSLT for example, is stripped unless you have specifically declared it to be preserved. Namespaces are also a somewhat murky area but XPath and other specs treat a namespace declaration as distributing its information across all child nodes unless they are shielded by another namespace declaration with the same prefix. What this means is that if you have a hunk of XML like <Element z=" a, b, c " a="a" xmlns:Prefix="data:1234" > <A>1</A><B> <C Prefix:m="n" > </C> </B> </Element > and you did any XML processing with it, it would be nonconformant (in the presence of a DTD declaration other than CDATA) not to convert the value of the z attribute to "a, b, c". And if would entirely reasonable to get an internal representation which, if you output it, was something like <Element xmlns:Prefix="data:1234" a="a" z="a, b, c"> <A xmlns:Prefix="data:1234">1</A><B xmlns:Prefix="data:1234"> <C xmlns:Prefix="data:1234" Prefix:m="n"> </C> </B> </Element> or a variety of amounts of normalization between this and the input. All would be conformant to the XML rules. The above isn't the canonical printout according to the current W3C canonical XML proposal but will give you an idea of the normalization that can occur to the internal data structure just from reading XML for normal conformant XML processing. Sure, if you have XML but are treating it as binary data, you many not need any canonicalization. And if you have XML and treat it just as text, you may need only minimal canonicalization. But if you are going to process it as XML and want signatures over it that are interoperable, I don't see how you can escape the need for XML canonicalization. SignedInfo is XML, is signed, and I would think we would want those signatures to be interoperable. Thus I conclude that at least the default and quite possibly the fixed canonicalization for SignedInfo must be an XML canonicalization. Because we control the syntax of SignedInfo, we can make additional choices. Although I'm not proposing any decision at this time, we do not make any use of XML Comments in SignedInfo, for example, so if we decided that it was reasonable never to do so and if we also decided there was utility in allowing unsecured comments to be sprinkled into and removed from SignedInfo, we could specify a default XML canonicalization (or transform if you are worried about my use of the c14n word stepping on the toes of the W3C official c14n effort) that stripped out all XML comments. Thanks, Donald ===================================================================== Donald E. Eastlake 3rd +1 914-276-2668 dee3@torque.pothole.com 65 Shindegan Hill Road, RR#1 +1 914-784-7913(work) dee3@us.ibm.com Carmel, NY 10512 USA
Received on Sunday, 24 October 1999 23:07:50 UTC