Re: No Character Normalization? from John Cowan on 2000-06-23 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: John Cowan <jcowan@reutershealth.com>
Date: Fri, 23 Jun 2000 16:55:41 -0400
To: Kevin Regan <kevinr@valicert.com>
CC: jboyer@PureEdge.com, w3c-ietf-xmldsig@w3.org
Message-ID: <3953CECD.F0A88934@reutershealth.com>

Kevin Regan wrote:

> It seems that the responsibility for creating canonicalizable or signable
> documents is being pushed to the application creating the XML documents to
> be signed (as well as the application producing the XML Signature document
> itself).

That is indeed the Web's treatment of normalization: it should be done by
the document creator.

> However, won't it most likely be the case that producers of XML
> documents will not have nearly the resources or technical no-how to
> reasonably perform this character normalization?

Essentially any document in a pre-Unicode charset such as ASCII, Shift_JIS,
8859-1, etc. etc. is already normalized.  The only way to create unnormalized
documents is to create your documents directly in Unicode, and even then
you have to work at it.  (There are certain obscure legacy character sets
like ISO 5426/27/28 which are not prenormalized, but they are rarely if ever
used outside information systems for libraries.)

> The goal of the XML C14N spec seems to be to avoid the additional work

There is a more serious problem.  If C14N performed normalization itself,
as the February 2000 draft proposed, then it would be possible to create
a forged version of a document in which some attributes or elements had different
(non-normalized) names from the original, but which still passed a signature check.
An application relying on the signature might then malfunction in hard-to-spot
ways.

> Currently, the XML Signature spec recommends creating a failure condition
> when the appropriate normalized form for input is not detected as well as
> creating its output in the same normalized form.

Thereby detecting forgeries such as I described above, because such forgeries
are necessarily non-normalized documents.

> One final question.  Is it possible for the processing of an XML document
> to change the character format? 

Sure.  C14N accepts documents in whatever character set, and produces canonicalized
versions in UTF-8.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)

Received on Friday, 23 June 2000 16:56:19 UTC