- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 30 Jul 2003 11:51:17 -0400
- To: Tom Gindin <tgindin@us.ibm.com>, Joseph Reagle <reagle@w3.org>
- Cc: w3c-i18n-ig@w3.org, w3c-ietf-xmldsig@w3.org, w3c-ietf-xmldsig-request@w3.org
Hello Tom, My idea was explicitly not to add a new definition to the spec, because this would be inappropriate for a clarification. Others have expressed similar concerns. Regards, Martin. At 11:27 03/07/30 -0400, Tom Gindin wrote: > Joseph: > > Here's my try at the wording: >Note: Canonical XML is an octet sequence resulting from characters, from >the >UCS character domain, encoded in UTF-8. Creating a deterministic octet >sequence is necessary for XML Signature and other applications. However, >some applications might want a canonical form of XML in a different >encoding, or one that is simply a sequence of characters, without concern >for its encoding. The "canonical character form" of Canonical XML consists >of >the sequence of characters resulting when the UTF-8 format defined in this >document is converted to characters. The "canonical UCS-4 form" consists >of the >sequence of octets produced by the conversion of the canonical character >form >to UCS-4. The "canonical UTF-16 form" consists of the sequence of octets >produced by the conversion of the canonical character form to UTF-16. > > I have one substantive question, however. Is there any need to >produce a >canonical form with less escaping than the current ones? > If we define canonical forms in other encodings, do those >canonicalizations need >their own tags? > > > Tom Gindin > > > > > >Joseph Reagle <reagle@w3.org> >Sent by: w3c-ietf-xmldsig-request@w3.org >07/28/2003 04:39 PM > > > To: Martin Duerst <duerst@w3.org>, w3c-ietf-xmldsig@w3.org > cc: w3c-i18n-ig@w3.org > Subject: Re: Request for clarification on Canonical XML > > > > >On Monday 28 July 2003 13:53, Martin Duerst wrote: > > The current text is slightly problematic because it says 'without >concern > > for its encoding' and then goes straight on to mention UTF-16. UTF-16 > > indeed does not deal with octets, but it is still an encoding. > >So your point is that the UTF-8 encoding can be restrictive because (1) >one >may not want to use any encoding (what you mean by "abstract modeling"?) >or >(2) one may want to use a different encoding. > > > Also, this version of the text doesn't mention abstract modeling >anymore. > > It might also be better to replace 'may require' with 'may be better > > served with'. > >I tweaked it to "might want" so as to avoid the "MAY", but be terse and >not >presume to tell them what they are better served with. <smile/> Ok, how >about. > >[[[ >Note: Canonical XML is an octet sequence resulting from characters, from >the >UCS character domain, encoded in UTF-8. Creating a deterministic octet >sequence is necessary for XML Signature and other applications. However, >some applications might want a canonical form of XML in a different >encoding, or one that is simply a sequence of characters, without concern >for its encoding. For example, it may be appropriate to choose UTF-16 >rather than UTF-8 as the encoding of an API in a programming language >using >UTF-16 to represent Unicode strings, such as Java or Python. Or, one might > >want to abstractly describe an XML document as an Infoset that includes >sequences of characters. In such cases, applications are not prohibited >from defining and using a canonical character sequence that corresponds to > >the characters of a Canonical XML instance. >]]] > >
Received on Wednesday, 30 July 2003 13:05:14 UTC