Re: Request for clarification on Canonical XML from Tom Gindin on 2003-07-30 (w3c-ietf-xmldsig@w3.org from July to September 2003)

From: Tom Gindin <tgindin@us.ibm.com>
Date: Wed, 30 Jul 2003 11:27:27 -0400
To: Joseph Reagle <reagle@w3.org>
Cc: Martin Duerst <duerst@w3.org>, w3c-i18n-ig@w3.org, w3c-ietf-xmldsig@w3.org, w3c-ietf-xmldsig-request@w3.org
Message-ID: <OF571CB8DF.209F4198-ON85256D72.0051FEF6-85256D73.0054E600@us.ibm.com>

        Joseph:

        Here's my try at the wording:
Note: Canonical XML is an octet sequence resulting from characters, from 
the 
UCS character domain, encoded in UTF-8. Creating a deterministic octet 
sequence is necessary for XML Signature and other applications. However, 
some applications might want a canonical form of XML in a different 
encoding, or one that is simply a sequence of characters, without concern 
for its encoding. The "canonical character form" of Canonical XML consists 
of
the sequence of characters resulting when the UTF-8 format defined in this
document is converted to characters.  The "canonical UCS-4 form" consists 
of the
sequence of octets produced by the conversion of the canonical character 
form
to UCS-4.  The "canonical UTF-16 form" consists of the sequence of octets 
produced by the conversion of the canonical character form to UTF-16.

        I have one substantive question, however.  Is there any need to 
produce a 
canonical form with less escaping than the current ones?
        If we define canonical forms in other encodings, do those 
canonicalizations need 
their own tags?

                Tom Gindin

Joseph Reagle <reagle@w3.org>
Sent by: w3c-ietf-xmldsig-request@w3.org
07/28/2003 04:39 PM

        To:     Martin Duerst <duerst@w3.org>, w3c-ietf-xmldsig@w3.org
        cc:     w3c-i18n-ig@w3.org
        Subject:        Re: Request for clarification on Canonical XML

On Monday 28 July 2003 13:53, Martin Duerst wrote:
> The current text is slightly problematic because it says 'without 
concern
> for its encoding' and then goes straight on to mention UTF-16. UTF-16
> indeed does not deal with octets, but it is still an encoding.

So your point is that the UTF-8 encoding can be restrictive because (1) 
one 
may not want to use any encoding (what you mean by "abstract modeling"?) 
or 
(2) one may want to use a different encoding. 

> Also, this version of the text doesn't mention abstract modeling 
anymore.
> It might also be better to replace 'may require' with 'may be better
> served with'.

I tweaked it to "might want" so as to avoid the "MAY", but be terse and 
not 
presume to tell them what they are better served with. <smile/> Ok, how 
about.

[[[
Note: Canonical XML is an octet sequence resulting from characters, from 
the 
UCS character domain, encoded in UTF-8. Creating a deterministic octet 
sequence is necessary for XML Signature and other applications. However, 
some applications might want a canonical form of XML in a different 
encoding, or one that is simply a sequence of characters, without concern 
for its encoding. For example, it may be appropriate to choose UTF-16 
rather than UTF-8 as the encoding of an API in a programming language 
using 
UTF-16 to represent Unicode strings, such as Java or Python. Or, one might 

want to abstractly describe an XML document as an Infoset that includes 
sequences of characters. In such cases, applications are not prohibited 
from defining and using a canonical character sequence that corresponds to 

the characters of a Canonical XML instance.
]]]

Received on Wednesday, 30 July 2003 11:28:11 UTC