- From: Graham Klyne <gk@ninebynine.org>
- Date: Fri, 25 Jul 2003 10:07:35 +0100
- To: Martin Duerst <duerst@w3.org>, w3c-ietf-xmldsig@w3.org
- Cc: w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org, "Peter F. " Patel-Schneider <pfps@research.bell-labs.com>
I think what would help here would be to define a (referencable) canonical character sequence form, and use that to define the canonical octet sequence representation. #g -- At 16:04 24/07/03 -0400, Martin Duerst wrote: >Hello Joseph, dear XML Signature specialists, > >I would like to request a clarification on Canonical XML >(http://www.w3.org/TR/2001/REC-xml-c14n-20010315). > >At http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Terminology, >Canonical XML says: > > >>>> >The canonical form of an XML document is physical representation of the >document produced by the method described in this specification. The >changes are summarized in the following list: > >* The document is encoded in UTF-8 > >>>> > >There are numerous applications (parser testing, digital signatures, >encryption) where it is important to have an actual physical representation >for simple octet comparison or for input to cryptographic algorithms >that usually take octet streams as inputs. > >However, it has recently come to my attention that there are also some >attempts to use Canonical XML (or Exclusive XML Canonicalisation, which >inherits this aspect of its definition from Canonical XML) in other >situations, such as purely abstract modeling and comparison of XML >documents or XML fragments, or for API definitions. > >For abstract modeling, the encoding in UTF-8 irrelevant and confusing. >For API definitions, specifying the encoding is crucial, but it may >be counterproductive to use UTF-8 in a context where UTF-16 is >widely used (e.g. Java, Python). > >Although not appropriate in these cases, it seems to be difficult for >users of Canonical XML to abstract them from UTF-8 where necessary. >I therefore propose to add some clarification. As a first actual >text proposal (may need some additional work), I propose to >add a note at the end of Section 1.1, Terminology: > >Note: Canonical XML is defined here in terms of a physical (octet-based) >representation. This is appropriate for many applications, ranging from >digital signatures to parser testing. However, there are cases where a >physical representation is not needed, and there are cases where another >physical representation is appropriate. As an example, it may be >aproriate to choose UTF-16 rather than UTF-8 as the encoding of an >API in a programming language using UTF-16 to represent Unicode strings, >such as Java or Python. In such cases, users of Canonical XML should >abstract from the physical character encoding if they note this >appropriately. > >I'm sure there is a better way to word this. > > >Regards, Martin. ------------------- Graham Klyne <GK@NineByNine.org> PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E
Received on Friday, 25 July 2003 05:12:48 UTC