W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > July 2003

RE: Request for clarification on Canonical XML

From: Martin Duerst <duerst@w3.org>
Date: Thu, 24 Jul 2003 17:58:26 -0400
Message-Id: <>
To: "John Boyer" <JBoyer@PureEdge.com>, <w3c-ietf-xmldsig@w3.org>
Cc: <w3c-i18n-ig@w3.org>, <w3c-rdfcore-wg@w3.org>, "Peter F. \" Patel-Schneider" <pfps@research.bell-labs.com>

Hello John,

Many thanks for your quick reply. Your proposed tweak would be
fine by me.

Regards,   Martin.

At 13:43 03/07/24 -0700, John Boyer wrote:

>Hi Martin,
>The wording you gave for the note below seems fine to me, except the last 
>sentence, which begins "In such cases, users of Canonical XML should...", 
>should begin "In such cases, users of Canonical XML may...".
>John Boyer, Ph.D.
>Senior Product Architect and Research Scientist
>PureEdge Solutions Inc.
>-----Original Message-----
>From: Martin Duerst [mailto:duerst@w3.org]
>Sent: Thursday, July 24, 2003 1:05 PM
>To: w3c-ietf-xmldsig@w3.org
>Cc: w3c-i18n-ig@w3.org; w3c-rdfcore-wg@w3.org; Peter F. "
>Subject: Request for clarification on Canonical XML
>Hello Joseph, dear XML Signature specialists,
>I would like to request a clarification on Canonical XML
>At http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Terminology,
>Canonical XML says:
>  >>>>
>The canonical form of an XML document is physical representation of the
>document produced by the method described in this specification. The
>changes are summarized in the following list:
>* The document is encoded in UTF-8
>  >>>>
>There are numerous applications (parser testing, digital signatures,
>encryption) where it is important to have an actual physical representation
>for simple octet comparison or for input to cryptographic algorithms
>that usually take octet streams as inputs.
>However, it has recently come to my attention that there are also some
>attempts to use Canonical XML (or Exclusive XML Canonicalisation, which
>inherits this aspect of its definition from Canonical XML) in other
>situations, such as purely abstract modeling and comparison of XML
>documents or XML fragments, or for API definitions.
>For abstract modeling, the encoding in UTF-8 irrelevant and confusing.
>For API definitions, specifying the encoding is crucial, but it may
>be counterproductive to use UTF-8 in a context where UTF-16 is
>widely used (e.g. Java, Python).
>Although not appropriate in these cases, it seems to be difficult for
>users of Canonical XML to abstract them from UTF-8 where necessary.
>I therefore propose to add some clarification. As a first actual
>text proposal (may need some additional work), I propose to
>add a note at the end of Section 1.1, Terminology:
>Note: Canonical XML is defined here in terms of a physical (octet-based)
>representation. This is appropriate for many applications, ranging from
>digital signatures to parser testing. However, there are cases where a
>physical representation is not needed, and there are cases where another
>physical representation is appropriate. As an example, it may be
>aproriate to choose UTF-16 rather than UTF-8 as the encoding of an
>API in a programming language using UTF-16 to represent Unicode strings,
>such as Java or Python. In such cases, users of Canonical XML should
>abstract from the physical character encoding if they note this
>I'm sure there is a better way to word this.
>Regards,    Martin.
Received on Thursday, 24 July 2003 18:08:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:24:24 UTC