- From: Hiroshi Maruyama <MARUYAMA@jp.ibm.com>
- Date: Tue, 14 Nov 2000 14:33:49 +0900
- To: xml-encryption@w3.org
- Cc: "Makoto 1 Murata" <EB91801@jp.ibm.com>
It is true that C14N makes irreversible changes to XML documents. However, it is also true that you can NOT exactly preserve an XML document (I mean, as a character string) if you use an XML processor as described in XML 1.0 specification. A conformant processor MUST normalize attribute values, for example. A conformat processor may discard information on how many white space characters appeared in between attributes, as another example. In other words, applications rely on XML processors to extract logical information expressed in XML. This logical information is collectively called Information Set. It is unfortunate that Information Set was not defined PRIOR TO XML 1.0, but still I believe that subsequent XML-related specifications should be defined in terms of Information Set. When I say "preserve information", I mean "preserve information set". If we assume that XML documents are processed by conformat XML processors before passed to an application, it is Information Set that the application sees. Therefore, preserving textual representation is not important here. Hiroshi -- Hiroshi Maruyama Manager, Internet Technology, Tokyo Research Laboratory +81-46-215-4576 maruyama@jp.ibm.com From: hal@finney.org on 2000/11/13 10:16 Please respond to hal@finney.org To: Hiroshi Maruyama/Japan/IBM@IBMJP, xml-encryption@w3.org cc: Subject: Re: Serialization and canonicalization The C14N canonicalization makes irreversible changes to the document. If we canonicalize before encrypting, there is no way we can recover the original document upon decryption. According to http://www.w3.org/TR/2000/CR-xml-c14n-20001026, canonicalization includes such steps as: Character and parsed entity references are replaced CDATA sections are replaced with their character content The XML declaration and document type declaration (DTD) are removed Empty elements are converted to start-end tag pairs Attribute value delimiters are set to double quotes Special characters in attribute values and character content are replaced by character references Superfluous namespace declarations are removed from each element Default attributes are added to each element Lexicographic order is imposed on the namespace declarations and attributes of each element I think it would be desirable to retain the DTD and XML declarations across the encryption/decryption transform (if we do include those parts within the encrypted region). Also I don't think we should add default attributes to each element, or reorder attributes and namespace declarations to lexicographic order, or do most of these other changes. XML is more than a machine readable format. The creator of the document may have made decisions about the use of entities or character encodings, quote style and ordering of attributes based on readability and cleanliness. C14N considers these aspects unimportant for functional purposes and will change them. That's fine for signature verification, but not for encryption/decryption. I am somewhat confused about the processing model which is envisioned for XML encryption. It appears that it may be something like: 1. Parse XML into node-set 2. Select node(s) to encrypt 3. Serialize selected nodes into UTF-8 byte stream (along the lines of the C14N process) 4. Encrypt the resulting byte stream using standard methods 5. Package the encrypted byte stream with XML wrappers 6. Insert the resulting XML node-set back into the original document in place of original node-set (as one possibility at least) 7. Re-serialize modified document to produce output I was thinking in terms of an alternative, based more on the original XML document. In this model the parsing serves as a guide to identify substrings of the original document which are targets for encryption. These are encrypted, the data is wrapped in XML format, and the plaintext substring is replaced with the serialized form of the XML-wrapped encrypted ciphertext. This is perhaps functionally the same as the node-set based model, except that minimal canonicalization is used as defined by XML Signature, or even no canonicalization at all is done. Hal Finney PGP Security > From: "Hiroshi Maruyama" <MARUYAMA@jp.ibm.com> > Date: Mon, 13 Nov 2000 09:25:31 +0900 > > When encrypting a substructure of an XML document, we need to > preserve the data model so that it will be decrypted into exactly > the same substructure. XML Canonicalization (or C14N) is one > way to serialize an XML substructure without losing any information. > As long as the data model (or information set) is preserved, any > serialization method will do. C14N satisfies this property and > is implemented for XML Signature anyway, I think it is reasonable > to reuse the C14N standard. > By the way, I believe this discussion is exactly why I insist that > the processing model of XML Encryption should be defined using > the XML InfoSet (or equivalent data model). It may free us from > confusing questions such as character encoding, default > attribute values, external entities, data types, and so on. > > Hiroshi
Received on Tuesday, 14 November 2000 00:34:25 UTC