C14N canonicalization

Dear Members,

I'm checking several parsers + C14N canonicalization solutions to provide
interoperability between applications. I've noticed strange functioning,
therefore I've read through again the W3C C14N standard, but I couldn't find
out which the correct way is. Could you please help me in explaining the
text of the standard?

What does this sentence exactly mean?

"The string value of the node is modified by replacing all ampersands (&)
with &amp;, all open angle brackets (<) with &lt;, all quotation mark
characters with &quot;, and the whitespace characters #x9, #xA, and #xD,
with character references. The character references are written in uppercase
hexadecimal with no leading zeroes (for example, #xD is represented by the
character reference &#xD;)."
(http://www.w3.org/TR/xml-c14n)

The following example was given as input for parsing and C14N
canonicalization:

<doc>
   <e1/>
</doc>

which contains the bit sequence (in hex) of

"0D 0A 20 20 20".

between the two tags.

I've got outputs (made by several applications) that contained e.g.

"0A 20 20 20" (in this case the escaped "#xD" character is missing, but I
think this is the correct way)

"0A 09" (the three hex "20" have been converted to hex "09" which is TAB)

"26 23 78 44 3B 0A 20 20 20" (in which "26 23 78 44 3B" is "&#xD;")

Which is the correct one? Any idea?

Best regards,
Aron

----------------------------------------------------
Aron Szabo, M. Sc.
Research Associate,
Center of Information Technology
Budapest University of Technology and Economics

Received on Wednesday, 5 October 2005 15:16:31 UTC