C14N question from Szabó Áron on 2005-10-05 (www-xml-canonicalization-comments@w3.org from October 2005)

From: Szabó Áron <aron@ik.bme.hu>
Date: Wed, 5 Oct 2005 10:43:32 +0200
To: <www-xml-canonicalization-comments@w3.org>
Message-Id: <20051005083908.3F75476B@cronos.ik.bme.hu>

Dear Members,

I'm checking several parsers + C14N canonicalization solutions to provide
interoperability between applications. I've noticed strange functioning,
therefore I've read through again the W3C C14N standard, but I couldn't find
out which the correct way is. Could you please help me in explaining the
text of the standard?

What does this sentence exactly mean?

"The string value of the node is modified by replacing all ampersands (&)
with &amp;, all open angle brackets (<) with &lt;, all quotation mark
characters with &quot;, and the whitespace characters #x9, #xA, and #xD,
with character references. The character references are written in uppercase
hexadecimal with no leading zeroes (for example, #xD is represented by the
character reference &#xD;)."
(http://www.w3.org/TR/xml-c14n)

The following example was given as input for parsing and C14N
canonicalization:

<doc>
   <e1/>
</doc>

which contains the bit sequence (in hex) of

"0D 0A 20 20 20".

between the two tags.

I've got outputs (made by several applications) that contained e.g.

"0A 20 20 20" (in this case the escaped "#xD" character is missing)
"0A 09" (the three "20" have been converted to "09" which is TAB)
"26 23 78 44 3B 0A 20 20 20" (in which "26 23 78 44 3B" is "&#xD;")

Which is the correct one? Any idea?

Best regards,
Aron

----------------------------------------------------
Aron Szabo, M. Sc.
Research Associate,
Center of Information Technology
Budapest University of Technology and Economics

Received on Wednesday, 5 October 2005 08:43:52 UTC