Re: Serialization and canonicalization


When encrypting a substructure of an XML document, we need to
preserve the data model so that it will be decrypted into exactly
the same substructure.  XML Canonicalization (or C14N) is one
way to serialize an XML substructure without losing any information.
As long as the data model (or information set) is preserved, any
serialization method will do.  C14N satisfies this property and
is implemented for XML Signature anyway, I think it is reasonable
to reuse the C14N standard.
By the way, I believe this discussion is exactly why I insist that
the processing model of XML Encryption should be defined using
the XML InfoSet (or equivalent data model).  It may free us from
confusing questions such as character encoding, default
attribute values, external entities, data types, and so on.


Hiroshi Maruyama
Manager, Internet Technology, Tokyo Research Laboratory

From: on 2000/11/13 05:41

Please respond to

Sent by:

Subject:  Serialization and canonicalization

There has been mention of serialization and canonicalization transforms
in the discussion so far.  I don't understand the need for these.

Serialization would be needed if we were starting with a non-serial
representation of the data to be encrypted, such as a DOM tree or an
XPath node-set.  Isn't it adequate to presume that we are beginning with
an XML document in serialized form?  Serialization issues could then be
ruled out of scope for this effort.

Canonicalization is an issue for signature verification, where it is
desired that semantically-neutral changes to an XML signed document
could still allow the signature to verify.  This motivated the desire
to specify canonicalization algorithms for the XML signature effort.

However the issue does not arise in the same way for XML encryption.
Canonicalizing data before encryption would not aid in decryption,
as far as I can see.

The one transform which seems relevant to an encryption effort is
compression.  Compressing data before encryption is helpful for two
reasons.  First, encrypted data is not compressible, so compressing before
encryption is our only opportunity to do so.  Second, the compressed data
generally has less structure then plaintext data.  This can theoretically
make the encryption harder to break (but this is a weak effect, and
there are countervailing factors).

Are there reasons for continuing to consider serialization and
canonicalization issues?

Hal Finney
PGP Security

Received on Sunday, 12 November 2000 19:26:29 UTC