Canonicalization from a Digital Signature Point of View

Based on the discussion about canonicalization, we may need a 
more precise definition of canonicalization from a digital 
signature point of view.  The following is a preliminary 
proposal:

Let Xi be an XML document.  

Canonicalization is a function C, such that C(Xi) = M produces a 
variable length octet string M suitable for input to a 
cryptographic hash function.  

Given a set of XML documents, X1, ..., Xn, canonicalization must 
have an "equivalence" property:

1.  If C(Xi) = C(Xj) for any pair of documents Xi and Xj in the 
set, then Xi and Xj must have the same legal meaning, business 
information, and aesthetic value (assuming we wish to have 
lawyers make contracts, business people communicate data, and 
authors sign works).

Canonicalization must also have an "exclusion" property:

2.  It must be computationally infeasible to find a document Xx, 
not in the set X1, ..., Xn, such that C(Xx) = C(Xi) and such that 
Xx has a different legal meaning, business information, and 
aesthetic value than Xi.

To be useful for processing XML documents that are communicated 
among digital signature signers and verifiers, cannonicalization 
must also have a "completeness" property:

3.  If the set of XML documents X1, ..., Xn is produced by the 
application in any arbitrary sequence of conforming, but 
differing, XML parsers and generators, then C(Xi) must equal 
C(Xj) for every pair Xi and Xj.  

In the last case, generators should be understood to convert DOM 
representations to concretely encoded surface strings, and 
processors should be understood to convert concretely encoded 
surface strings to DOMs.  

Since canonicalizers are required to have a many to one mapping 
property which is forbidden to cryptographic hashing algorithms, 
I think it is essential to keep the specifications quite 
separate, with an easily understandable representation of the 
document at their interface.  If the canonicalized form is XML, 
then it is easier to show that the operation of the canonicalizer 
has the "equivalence" property by using a variety of commonly 
available XML tools to show that the output is equivalent to the 
input.  

The requirements stated above do not appear to conflict with 
those in http://www.w3.org/TR/NOTE-xml-canonical-req.  However, 
neither the "exclusion" nor the "completeness" properties are 
clearly stated there.  They may be implied? 

Milton M. Anderson
Technical Projects Director
Financial Services Technology Consortium
276 Dartmouth Avenue
Fair Haven, NJ 07704-3121
+1 732 747 1514
miltonma@gte.net

Received on Sunday, 18 April 1999 10:05:48 UTC