C14N and escape of CR characters from Szabo Aron on 2006-08-10 (w3c-ietf-xmldsig@w3.org from July to September 2006)

From: Szabo Aron <aron@ik.bme.hu>
Date: Thu, 10 Aug 2006 09:40:53 +0200
To: <w3c-ietf-xmldsig@w3.org>, <PLUGTESTS-XADES@LIST.ETSI.ORG>
Message-Id: <20060810074052.71B3C1930@cronos.ik.bme.hu>

Dear Members,

I need a little explanation to the text of C14N canonicalization standard
(W3C). It is not completely clear how white spaces, especially CR (0D in
hexadecimal) and LF (0A in hexadecimal) characters are handled, escaped.
Could you just show me an example where CR as white space characters are
escaped as &#xD; characters? In other words: what is a "Text Node" exactly?
I've examined the given examples at "3.3 Start and End Tags" and "3.4
Character Modifications and Character References", but there is no such a
case: in 3.3 the CR characters are present as white space characters
therefore they are deleted, in 3.4 the &#x0d; is escaped, but it is present
as characters, and not as white spaces.

2.1 Data Model
[...]
All whitespace within the root document element MUST be preserved (except
for any #xD characters deleted by line delimiter normalization). This
includes all whitespace in external entities. Whitespace outside of the root
document element MUST be discarded.

2.3 Processing Model
[...]
Text Nodes- the string value, except all ampersands are replaced by &amp;,
all open angle brackets (<) are replaced by &lt;, all closing angle brackets
(>) are replaced by &gt;, and all #xD characters are replaced by &#xD;.

Thanks in advance!

Aron Szabo
----------
Budapest University of Technology and Economics (BME)

Received on Thursday, 10 August 2006 07:41:23 UTC