RE: Character Encoding Question

At 4:40 AM +0900 12/1/00, Martin J. Duerst wrote:
>>I do. :-) "Non-Unicode" is not specific enough to prevent 
>>confusion, as this discussion has shown.
>
>'non-unicode' is not part of the wording suggested.

Maybe we are talking about different things. Section 6.5 of 
draft-ietf-xmldsig-core-11.txt says:

>    Various canonicalization algorithms transcode from a non-Unicode
>    encoding to Unicode. The two algorithms below perform text
>    normalization during transcoding [NFC]. We RECOMMEND that externally
>    specified canonicalization algorithms do the same. (Note, there can be
>    ambiguities in converting existing charsets to Unicode, for an example
>    see the XML Japanese Profile [XML-Japanese] NOTE.)

The terms "non-Unicode encoding" and "Unicode" are not defined. I 
believe any of the following could be the definition of non-Unicode 
encoding:

- all charsets except UTF-8, UTF-16, UTF-16BE, and UTF-16LE
- all charsets except UTF-8, UTF-16, UTF-16BE, UTF-16LE, UCS-2, and UCS-4
- all charsets that are not defined by the Unicode Consortium in some 
specific version of the Unicode Standard
- something else

Clearly, "Unicode" is the opposite of "non-Unicode".

--Paul Hoffman, Director
--Internet Mail Consortium

Received on Thursday, 30 November 2000 17:06:06 UTC