RE: Request for clarification on Canonical XML from Martin Duerst on 2003-07-24 (w3c-rdfcore-wg@w3.org from July 2003)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 24 Jul 2003 17:58:26 -0400
To: "John Boyer" <JBoyer@PureEdge.com>, <w3c-ietf-xmldsig@w3.org>
Cc: <w3c-i18n-ig@w3.org>, <w3c-rdfcore-wg@w3.org>, "Peter F. \" Patel-Schneider" <pfps@research.bell-labs.com>
Message-Id: <4.2.0.58.J.20030724175755.052793d0@localhost>

Hello John,

Many thanks for your quick reply. Your proposed tweak would be
fine by me.

Regards,   Martin.

At 13:43 03/07/24 -0700, John Boyer wrote:

>Hi Martin,
>
>The wording you gave for the note below seems fine to me, except the last 
>sentence, which begins "In such cases, users of Canonical XML should...", 
>should begin "In such cases, users of Canonical XML may...".
>
>Thanks,
>John Boyer, Ph.D.
>Senior Product Architect and Research Scientist
>PureEdge Solutions Inc.
>
>
>-----Original Message-----
>From: Martin Duerst [mailto:duerst@w3.org]
>Sent: Thursday, July 24, 2003 1:05 PM
>To: w3c-ietf-xmldsig@w3.org
>Cc: w3c-i18n-ig@w3.org; w3c-rdfcore-wg@w3.org; Peter F. "
>Patel-Schneider
>Subject: Request for clarification on Canonical XML
>
>
>
>Hello Joseph, dear XML Signature specialists,
>
>I would like to request a clarification on Canonical XML
>(http://www.w3.org/TR/2001/REC-xml-c14n-20010315).
>
>At http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Terminology,
>Canonical XML says:
>
>  >>>>
>The canonical form of an XML document is physical representation of the
>document produced by the method described in this specification. The
>changes are summarized in the following list:
>
>* The document is encoded in UTF-8
>  >>>>
>
>There are numerous applications (parser testing, digital signatures,
>encryption) where it is important to have an actual physical representation
>for simple octet comparison or for input to cryptographic algorithms
>that usually take octet streams as inputs.
>
>However, it has recently come to my attention that there are also some
>attempts to use Canonical XML (or Exclusive XML Canonicalisation, which
>inherits this aspect of its definition from Canonical XML) in other
>situations, such as purely abstract modeling and comparison of XML
>documents or XML fragments, or for API definitions.
>
>For abstract modeling, the encoding in UTF-8 irrelevant and confusing.
>For API definitions, specifying the encoding is crucial, but it may
>be counterproductive to use UTF-8 in a context where UTF-16 is
>widely used (e.g. Java, Python).
>
>Although not appropriate in these cases, it seems to be difficult for
>users of Canonical XML to abstract them from UTF-8 where necessary.
>I therefore propose to add some clarification. As a first actual
>text proposal (may need some additional work), I propose to
>add a note at the end of Section 1.1, Terminology:
>
>Note: Canonical XML is defined here in terms of a physical (octet-based)
>representation. This is appropriate for many applications, ranging from
>digital signatures to parser testing. However, there are cases where a
>physical representation is not needed, and there are cases where another
>physical representation is appropriate. As an example, it may be
>aproriate to choose UTF-16 rather than UTF-8 as the encoding of an
>API in a programming language using UTF-16 to represent Unicode strings,
>such as Java or Python. In such cases, users of Canonical XML should
>abstract from the physical character encoding if they note this
>appropriately.
>
>I'm sure there is a better way to word this.
>
>
>Regards,    Martin.
>

Received on Thursday, 24 July 2003 18:08:57 UTC