- From: John Boyer <boyerj@ca.ibm.com>
- Date: Wed, 5 Oct 2005 09:20:42 -0700
- To: Szabó Áron <aron@ik.bme.hu>
- Cc: w3c-ietf-xmldsig@w3.org, w3c-ietf-xmldsig-request@w3.org
- Message-ID: <OF08A4A7F9.B5095F81-ON88257091.00585BA7-88257091.0059C94F@ca.ibm.com>
The one thing you have to be careful about when reading the spec is to make sure you're interpreting the sentences in the context in which they appear. In this case, you extracted a sentence that appears in the canonicalization of *attribute* nodes, but none of your example questions below pertain to the canonicalization of attributes. If you were trying to canonicalize an attribute, though, you would find that, for example, return-newline characters that are presented to the data model would be output as 
 Of course, the reason for this character reference encoding has to do with how you managed to get return newline sequences past the attribute value normalization and into the data model (info set) in the first place. Cheers. John M. Boyer, Ph.D. Senior Product Architect/Research Scientist Workplace, Portal and Collaboration Software IBM Victoria Software Lab E-Mail: boyerj@ca.ibm.com http://www.ibm.com/software/ Szabó Áron <aron@ik.bme.hu> Sent by: w3c-ietf-xmldsig-request@w3.org 10/05/2005 08:16 AM To <w3c-ietf-xmldsig@w3.org> cc Subject C14N canonicalization Dear Members, I'm checking several parsers + C14N canonicalization solutions to provide interoperability between applications. I've noticed strange functioning, therefore I've read through again the W3C C14N standard, but I couldn't find out which the correct way is. Could you please help me in explaining the text of the standard? What does this sentence exactly mean? "The string value of the node is modified by replacing all ampersands (&) with &, all open angle brackets (<) with <, all quotation mark characters with ", and the whitespace characters #x9, #xA, and #xD, with character references. The character references are written in uppercase hexadecimal with no leading zeroes (for example, #xD is represented by the character reference 
)." (http://www.w3.org/TR/xml-c14n) The following example was given as input for parsing and C14N canonicalization: <doc> <e1/> </doc> which contains the bit sequence (in hex) of "0D 0A 20 20 20". between the two tags. I've got outputs (made by several applications) that contained e.g. "0A 20 20 20" (in this case the escaped "#xD" character is missing, but I think this is the correct way) "0A 09" (the three hex "20" have been converted to hex "09" which is TAB) "26 23 78 44 3B 0A 20 20 20" (in which "26 23 78 44 3B" is "
") Which is the correct one? Any idea? Best regards, Aron ---------------------------------------------------- Aron Szabo, M. Sc. Research Associate, Center of Information Technology Budapest University of Technology and Economics
Received on Wednesday, 5 October 2005 16:20:58 UTC