- From: Milton M. Anderson <miltonma@gte.net>
- Date: Thu, 5 Aug 1999 17:56:00 -0400
- To: <w3c-ietf-xmldsig@w3.org>
In WD-xml-c14n-1990729.html "Canonical XML" the treatment of line ends (#xA) which occur between tags does not appear to be specified in the best way for hashing and signing operations. In Section 5.5 we have an example: <quote> For the following element <x> <a n-"1"/><b n="2"/> <c n="3"/></x> the canonical form is <x> <a n="1"></a><b n="2"/></b> <c n="3"></c></x> </quote> The example seems to imply too characteristics of the canonical form: 1. line ends can occur between tags in the canonical form, and 2. they are preserved when and if they were present in the original document and the do not occur according to a general rule. (The exception to the second assertion is the line end following </x>, which is required by the production rules at the beginning of Section 5.) These characteristics are undesirable for applications which rely on this C14N for hashing and signing since an application which: a. processes in a document containing original canonicalized and signed elements, b. operates on some elements but leaves others as originally signed, c. and produces a new signed document that includes both original and new signed elements will have to remember where the line ends were in the canonicalized form of the originally signed elements so that it can reproduce them in the output. Otherwise the hashes will be broken. Consider the following example: <x> <a>content1</a><b>content2</b> <c>content3</c></x> It would be better if canonicalization specified either: Option 1. line ends (or white space) by themselves may not occur between tags, e.g. the canonical form of the above would be: <x><a>content1</a><b>content2</b><c>content3</c></x> or Option 2. line ends are always inserted following end tags, but may not occur by themselves elsewhere, e.g. the canonical form of the above example would be: <x><a>content1</a> <b>content2</b> <c>content3</c> </x> It is not obvious how to change the production rules of Canonical XML to implement Option 1, since the line end is a Datachar according to rule [7] and an element can contain Datachars and elements in any number and order according to rule [2]. Option 2 can be implemented by changing rule [2] to read: [2] element ::=Stag (Datachar | element)* Etag #xA Note however, that a consequence of adopting this rule is that if <x> is MIXED, then: <x>content0<a>content1</a> <b>content2</b> <c>content3</c> </x> and <x><a>content1</a> <b>content2</b> <c>content3</c> content0</x> are not equivalent. In the first case, the content of <x> will be "content0" or "content0#xA#xA" (I'm not sure which?). In the second case the content of <x> will be "#xAcontent0" or "#xA#xA#xAcontent0" (again, I'm not sure whether the XML 1.0 specification requires that the processor pass through the #xAs after </a> and </b> as part of the content of <x>?). Therefore Option 1 may be preferable if a simple modification of the production rules can be made to specify it. Transport systems which wish to insert line ends to limit line length, could do so between any pair of tags, secure in the knowledge that the receiving canonicalizer would remove them prior to the processing of the document and its hashes and signatures. Milton M. Anderson Technical Projects Director Financial Services Technology Consortium 276 Dartmouth Avenue Fair Haven, NJ 07704-3121 +1 732 747 1514 miltonma@gte.net Systems Engineering: The art of working with general principles without overlooking any detail which may later turn out to have been important.
Received on Thursday, 5 August 1999 17:57:08 UTC