- From: Tom Gindin <tgindin@us.ibm.com>
- Date: Wed, 30 Jul 2003 11:27:27 -0400
- To: Joseph Reagle <reagle@w3.org>
- Cc: Martin Duerst <duerst@w3.org>, w3c-i18n-ig@w3.org, w3c-ietf-xmldsig@w3.org, w3c-ietf-xmldsig-request@w3.org
Joseph: Here's my try at the wording: Note: Canonical XML is an octet sequence resulting from characters, from the UCS character domain, encoded in UTF-8. Creating a deterministic octet sequence is necessary for XML Signature and other applications. However, some applications might want a canonical form of XML in a different encoding, or one that is simply a sequence of characters, without concern for its encoding. The "canonical character form" of Canonical XML consists of the sequence of characters resulting when the UTF-8 format defined in this document is converted to characters. The "canonical UCS-4 form" consists of the sequence of octets produced by the conversion of the canonical character form to UCS-4. The "canonical UTF-16 form" consists of the sequence of octets produced by the conversion of the canonical character form to UTF-16. I have one substantive question, however. Is there any need to produce a canonical form with less escaping than the current ones? If we define canonical forms in other encodings, do those canonicalizations need their own tags? Tom Gindin Joseph Reagle <reagle@w3.org> Sent by: w3c-ietf-xmldsig-request@w3.org 07/28/2003 04:39 PM To: Martin Duerst <duerst@w3.org>, w3c-ietf-xmldsig@w3.org cc: w3c-i18n-ig@w3.org Subject: Re: Request for clarification on Canonical XML On Monday 28 July 2003 13:53, Martin Duerst wrote: > The current text is slightly problematic because it says 'without concern > for its encoding' and then goes straight on to mention UTF-16. UTF-16 > indeed does not deal with octets, but it is still an encoding. So your point is that the UTF-8 encoding can be restrictive because (1) one may not want to use any encoding (what you mean by "abstract modeling"?) or (2) one may want to use a different encoding. > Also, this version of the text doesn't mention abstract modeling anymore. > It might also be better to replace 'may require' with 'may be better > served with'. I tweaked it to "might want" so as to avoid the "MAY", but be terse and not presume to tell them what they are better served with. <smile/> Ok, how about. [[[ Note: Canonical XML is an octet sequence resulting from characters, from the UCS character domain, encoded in UTF-8. Creating a deterministic octet sequence is necessary for XML Signature and other applications. However, some applications might want a canonical form of XML in a different encoding, or one that is simply a sequence of characters, without concern for its encoding. For example, it may be appropriate to choose UTF-16 rather than UTF-8 as the encoding of an API in a programming language using UTF-16 to represent Unicode strings, such as Java or Python. Or, one might want to abstractly describe an XML document as an Infoset that includes sequences of characters. In such cases, applications are not prohibited from defining and using a canonical character sequence that corresponds to the characters of a Canonical XML instance. ]]]
Received on Wednesday, 30 July 2003 11:28:11 UTC