RE: Re-posting of "what is minimal caonicalization" notes from Greg Whitehead on 1999-10-07 (w3c-ietf-xmldsig@w3.org from October to December 1999)

From: Greg Whitehead <gwhitehead@signio.com>
Date: Thu, 7 Oct 1999 14:54:46 -0700
To: "''w3c-ietf-xmldsig@w3.org' '" <w3c-ietf-xmldsig@w3.org>
Message-ID: <6B962A1EE646D31193270008C7A4BAB5093382@mail.paymentnet.com>

From an implementors point of view, I see a difference between rules
convenient for parsed XML (such as the signature itself and any embedded
objects) and referenced external objects (which may be of any type -- and
may not be well formed even if they are XML).

For the signature and any embedded objects, which even a minimal application
must parse in order to verify the signature, it seems most convenient to use
the same normalizations required of XML processors:

* [1] 2.11 says to normalize line endings to 0xA.

* [1] 2.10 says to "pass all characters in a document that are not markup".
I don't think that includes non-significant whitespace inside start/end
tags, but I'm not fluent enough in SGML lingo to be sure.

This would allow an implementation to easily work with parsed data.

I would also throw in normalization to UTF-8, and removal of the encoding
pseudo-attribute, since it makes it much easier to pass strings around in
existing code that is expecting ASCII (but I realize that is a biased
viewpoint).

For external objects, even if they are XML, it seems most convenient to use
something like the S/MIME rules:

* Normalize line endings for text/* content to 0xA and treat everything else
as binary data.

This avoids the need to parse any external XML content that is being signed,
unless some transformation is specified.

-Greg

[1] http://www.w3.org/TR/1998/REC-xml-19980210

Received on Thursday, 7 October 1999 17:55:17 UTC