Line End Treatment in WD-xml-c14n-1990729.html "Canonical XML" from Milton M. Anderson on 1999-08-05 (w3c-ietf-xmldsig@w3.org from July to September 1999)

From: Milton M. Anderson <miltonma@gte.net>
Date: Thu, 5 Aug 1999 17:56:00 -0400
To: <w3c-ietf-xmldsig@w3.org>
Message-ID: <00ac01bedf8d$68fdd140$e335bfd1@computer>
In WD-xml-c14n-1990729.html "Canonical XML" the treatment of line 
ends (#xA) which occur between tags does not appear to be 
specified in the best way for hashing and signing operations.  

In Section 5.5 we have an example:

<quote> 
For the following element

<x>
<a n-"1"/><b n="2"/>
<c n="3"/></x>

the canonical form is

<x>
<a n="1"></a><b n="2"/></b>
<c n="3"></c></x>

</quote>

The example seems to imply too characteristics of the canonical 
form:

1.  line ends can occur between tags in the canonical form, and

2.  they are preserved when and if they were present in the 
original document and the do not occur according to a general 
rule.

(The exception to the second assertion is the line end following 
</x>, which is required by the production rules at the beginning 
of Section 5.)

These characteristics are undesirable for applications which rely 
on this C14N for hashing and signing since an application which:

a. processes in a document containing original canonicalized and 
signed elements, 

b. operates on some elements but leaves others as originally 
signed,

c. and produces a new signed document that includes both original 
and new signed elements

will have to remember where the line ends were in the 
canonicalized form of the originally signed elements so that it 
can reproduce them in the output.  Otherwise the hashes will be 
broken.  

Consider the following example:

<x>
<a>content1</a><b>content2</b>
<c>content3</c></x>

It would be better if canonicalization specified either:

Option 1. line ends (or white space) by themselves may not occur 
between tags, e.g. the canonical form of the above would be:

<x><a>content1</a><b>content2</b><c>content3</c></x>

or

Option 2. line ends are always inserted following end tags, but 
may not occur by themselves elsewhere, e.g. the canonical form of 
the above example would be:

<x><a>content1</a>
<b>content2</b>
<c>content3</c>
</x>

It is not obvious how to change the production rules of Canonical 
XML to implement Option 1, since the line end is a Datachar 
according to rule [7] and an element can contain Datachars and 
elements in any number and order according to rule [2]. 

Option 2 can be implemented by changing rule [2] to read:

[2]   element ::=Stag (Datachar | element)* Etag #xA

Note however, that a consequence of adopting this rule is that if 
<x> is MIXED, then:

<x>content0<a>content1</a>
<b>content2</b>
<c>content3</c>
</x>

and 

<x><a>content1</a>
<b>content2</b>
<c>content3</c>
content0</x>

are not equivalent.  In the first case, the content of <x> will 
be "content0" or "content0#xA#xA" (I'm not sure which?). In the 
second case the content of <x> will be "#xAcontent0" or 
"#xA#xA#xAcontent0" (again, I'm not sure whether the XML 1.0 
specification requires that the processor pass through the #xAs 
after </a> and </b> as part of the content of <x>?).  

Therefore Option 1 may be preferable if a simple modification of 
the production rules can be made to specify it.  Transport 
systems which wish to insert line ends to limit line length, 
could do so between any pair of tags, secure in the knowledge 
that the receiving canonicalizer would remove them prior to the 
processing of the document and its hashes and signatures.  


Milton M. Anderson
Technical Projects Director
Financial Services Technology Consortium
276 Dartmouth Avenue
Fair Haven, NJ 07704-3121
+1 732 747 1514
miltonma@gte.net

Systems Engineering:  The art of working with 
general principles without overlooking any detail 
which may later turn out to have been important.
Received on Thursday, 5 August 1999 17:57:08 UTC