RE: Character Encoding Question from Paul Hoffman / IMC on 2000-11-30 (w3c-ietf-xmldsig@w3.org from October to December 2000)

From: Paul Hoffman / IMC <phoffman@imc.org>
Date: Thu, 30 Nov 2000 14:05:57 -0800
To: "Martin J. Duerst" <duerst@w3.org>, "John Boyer" <jboyer@PureEdge.com>, <w3c-ietf-xmldsig@w3.org>
Message-Id: <p0501040eb64c7e23206e@[165.227.249.17]>

At 4:40 AM +0900 12/1/00, Martin J. Duerst wrote:
>>I do. :-) "Non-Unicode" is not specific enough to prevent 
>>confusion, as this discussion has shown.
>
>'non-unicode' is not part of the wording suggested.

Maybe we are talking about different things. Section 6.5 of 
draft-ietf-xmldsig-core-11.txt says:

>    Various canonicalization algorithms transcode from a non-Unicode
>    encoding to Unicode. The two algorithms below perform text
>    normalization during transcoding [NFC]. We RECOMMEND that externally
>    specified canonicalization algorithms do the same. (Note, there can be
>    ambiguities in converting existing charsets to Unicode, for an example
>    see the XML Japanese Profile [XML-Japanese] NOTE.)

The terms "non-Unicode encoding" and "Unicode" are not defined. I 
believe any of the following could be the definition of non-Unicode 
encoding:

- all charsets except UTF-8, UTF-16, UTF-16BE, and UTF-16LE
- all charsets except UTF-8, UTF-16, UTF-16BE, UTF-16LE, UCS-2, and UCS-4
- all charsets that are not defined by the Unicode Consortium in some 
specific version of the Unicode Standard
- something else

Clearly, "Unicode" is the opposite of "non-Unicode".

--Paul Hoffman, Director
--Internet Mail Consortium

Received on Thursday, 30 November 2000 17:06:06 UTC