- From: <hal@finney.org>
- Date: Sun, 12 Nov 2000 17:16:23 -0800
- To: MARUYAMA@jp.ibm.com, xml-encryption@w3.org
The C14N canonicalization makes irreversible changes to the document. If we canonicalize before encrypting, there is no way we can recover the original document upon decryption. According to http://www.w3.org/TR/2000/CR-xml-c14n-20001026, canonicalization includes such steps as: Character and parsed entity references are replaced CDATA sections are replaced with their character content The XML declaration and document type declaration (DTD) are removed Empty elements are converted to start-end tag pairs Attribute value delimiters are set to double quotes Special characters in attribute values and character content are replaced by character references Superfluous namespace declarations are removed from each element Default attributes are added to each element Lexicographic order is imposed on the namespace declarations and attributes of each element I think it would be desirable to retain the DTD and XML declarations across the encryption/decryption transform (if we do include those parts within the encrypted region). Also I don't think we should add default attributes to each element, or reorder attributes and namespace declarations to lexicographic order, or do most of these other changes. XML is more than a machine readable format. The creator of the document may have made decisions about the use of entities or character encodings, quote style and ordering of attributes based on readability and cleanliness. C14N considers these aspects unimportant for functional purposes and will change them. That's fine for signature verification, but not for encryption/decryption. I am somewhat confused about the processing model which is envisioned for XML encryption. It appears that it may be something like: 1. Parse XML into node-set 2. Select node(s) to encrypt 3. Serialize selected nodes into UTF-8 byte stream (along the lines of the C14N process) 4. Encrypt the resulting byte stream using standard methods 5. Package the encrypted byte stream with XML wrappers 6. Insert the resulting XML node-set back into the original document in place of original node-set (as one possibility at least) 7. Re-serialize modified document to produce output I was thinking in terms of an alternative, based more on the original XML document. In this model the parsing serves as a guide to identify substrings of the original document which are targets for encryption. These are encrypted, the data is wrapped in XML format, and the plaintext substring is replaced with the serialized form of the XML-wrapped encrypted ciphertext. This is perhaps functionally the same as the node-set based model, except that minimal canonicalization is used as defined by XML Signature, or even no canonicalization at all is done. Hal Finney PGP Security > From: "Hiroshi Maruyama" <MARUYAMA@jp.ibm.com> > Date: Mon, 13 Nov 2000 09:25:31 +0900 > > When encrypting a substructure of an XML document, we need to > preserve the data model so that it will be decrypted into exactly > the same substructure. XML Canonicalization (or C14N) is one > way to serialize an XML substructure without losing any information. > As long as the data model (or information set) is preserved, any > serialization method will do. C14N satisfies this property and > is implemented for XML Signature anyway, I think it is reasonable > to reuse the C14N standard. > By the way, I believe this discussion is exactly why I insist that > the processing model of XML Encryption should be defined using > the XML InfoSet (or equivalent data model). It may free us from > confusing questions such as character encoding, default > attribute values, external entities, data types, and so on. > > Hiroshi
Received on Sunday, 12 November 2000 22:11:56 UTC