- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 24 Jul 2003 16:04:57 -0400
- To: w3c-ietf-xmldsig@w3.org
- Cc: w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org, "Peter F. " Patel-Schneider <pfps@research.bell-labs.com>
Hello Joseph, dear XML Signature specialists, I would like to request a clarification on Canonical XML (http://www.w3.org/TR/2001/REC-xml-c14n-20010315). At http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Terminology, Canonical XML says: >>>> The canonical form of an XML document is physical representation of the document produced by the method described in this specification. The changes are summarized in the following list: * The document is encoded in UTF-8 >>>> There are numerous applications (parser testing, digital signatures, encryption) where it is important to have an actual physical representation for simple octet comparison or for input to cryptographic algorithms that usually take octet streams as inputs. However, it has recently come to my attention that there are also some attempts to use Canonical XML (or Exclusive XML Canonicalisation, which inherits this aspect of its definition from Canonical XML) in other situations, such as purely abstract modeling and comparison of XML documents or XML fragments, or for API definitions. For abstract modeling, the encoding in UTF-8 irrelevant and confusing. For API definitions, specifying the encoding is crucial, but it may be counterproductive to use UTF-8 in a context where UTF-16 is widely used (e.g. Java, Python). Although not appropriate in these cases, it seems to be difficult for users of Canonical XML to abstract them from UTF-8 where necessary. I therefore propose to add some clarification. As a first actual text proposal (may need some additional work), I propose to add a note at the end of Section 1.1, Terminology: Note: Canonical XML is defined here in terms of a physical (octet-based) representation. This is appropriate for many applications, ranging from digital signatures to parser testing. However, there are cases where a physical representation is not needed, and there are cases where another physical representation is appropriate. As an example, it may be aproriate to choose UTF-16 rather than UTF-8 as the encoding of an API in a programming language using UTF-16 to represent Unicode strings, such as Java or Python. In such cases, users of Canonical XML should abstract from the physical character encoding if they note this appropriately. I'm sure there is a better way to word this. Regards, Martin.
Received on Thursday, 24 July 2003 16:06:27 UTC