Re: Request for clarification on Canonical XML

Hello Tom,

My idea was explicitly not to add a new definition to the spec,
because this would be inappropriate for a clarification. Others
have expressed similar concerns.

Regards,    Martin.

At 11:27 03/07/30 -0400, Tom Gindin wrote:
>         Joseph:
>
>         Here's my try at the wording:
>Note: Canonical XML is an octet sequence resulting from characters, from
>the
>UCS character domain, encoded in UTF-8. Creating a deterministic octet
>sequence is necessary for XML Signature and other applications. However,
>some applications might want a canonical form of XML in a different
>encoding, or one that is simply a sequence of characters, without concern
>for its encoding. The "canonical character form" of Canonical XML consists
>of
>the sequence of characters resulting when the UTF-8 format defined in this
>document is converted to characters.  The "canonical UCS-4 form" consists
>of the
>sequence of octets produced by the conversion of the canonical character
>form
>to UCS-4.  The "canonical UTF-16 form" consists of the sequence of octets
>produced by the conversion of the canonical character form to UTF-16.
>
>         I have one substantive question, however.  Is there any need to
>produce a
>canonical form with less escaping than the current ones?
>         If we define canonical forms in other encodings, do those
>canonicalizations need
>their own tags?
>
>
>                 Tom Gindin
>
>
>
>
>
>Joseph Reagle <reagle@w3.org>
>Sent by: w3c-ietf-xmldsig-request@w3.org
>07/28/2003 04:39 PM
>
>
>         To:     Martin Duerst <duerst@w3.org>, w3c-ietf-xmldsig@w3.org
>         cc:     w3c-i18n-ig@w3.org
>         Subject:        Re: Request for clarification on Canonical XML
>
>
>
>
>On Monday 28 July 2003 13:53, Martin Duerst wrote:
> > The current text is slightly problematic because it says 'without
>concern
> > for its encoding' and then goes straight on to mention UTF-16. UTF-16
> > indeed does not deal with octets, but it is still an encoding.
>
>So your point is that the UTF-8 encoding can be restrictive because (1)
>one
>may not want to use any encoding (what you mean by "abstract modeling"?)
>or
>(2) one may want to use a different encoding.
>
> > Also, this version of the text doesn't mention abstract modeling
>anymore.
> > It might also be better to replace 'may require' with 'may be better
> > served with'.
>
>I tweaked it to "might want" so as to avoid the "MAY", but be terse and
>not
>presume to tell them what they are better served with. <smile/> Ok, how
>about.
>
>[[[
>Note: Canonical XML is an octet sequence resulting from characters, from
>the
>UCS character domain, encoded in UTF-8. Creating a deterministic octet
>sequence is necessary for XML Signature and other applications. However,
>some applications might want a canonical form of XML in a different
>encoding, or one that is simply a sequence of characters, without concern
>for its encoding. For example, it may be appropriate to choose UTF-16
>rather than UTF-8 as the encoding of an API in a programming language
>using
>UTF-16 to represent Unicode strings, such as Java or Python. Or, one might
>
>want to abstractly describe an XML document as an Infoset that includes
>sequences of characters. In such cases, applications are not prohibited
>from defining and using a canonical character sequence that corresponds to
>
>the characters of a Canonical XML instance.
>]]]
>
>

Received on Wednesday, 30 July 2003 13:05:14 UTC