RE: Character Encoding Question from Paul Hoffman / IMC on 2000-11-30 (w3c-ietf-xmldsig@w3.org from October to December 2000)

From: Paul Hoffman / IMC <phoffman@imc.org>
Date: Thu, 30 Nov 2000 10:30:40 -0800
To: "Martin J. Duerst" <duerst@w3.org>, "John Boyer" <jboyer@PureEdge.com>, <w3c-ietf-xmldsig@w3.org>
Message-Id: <p05010409b64c4c6c71be@[165.227.249.17]>

At 2:29 AM +0900 12/1/00, Martin J. Duerst wrote:
>There is no problem with UCS-2 and UCS-4. The UCS is a set
>(in the math sense) of characters, each with a number associated.
>There is only one UCS. Just saying 'UCS', there are no assumptions whatsoever
>about representation (UCS-2 and UCS-4 are both 'charset' labels), and
>no assumptions about subsetting (UCS-2 can be used, in the right context,
>to denote a certain subset of the UCS). So I don't see any problem.

I do. :-) "Non-Unicode" is not specific enough to prevent confusion, 
as this discussion has shown. Does it mean:
- all charsets except UTF-8, UTF-16, UTF-16BE, and UTF-16LE
- all charsets except UTF-8, UTF-16, UTF-16BE, UTF-16LE, UCS-2, UCS-4
- all charsets that are not defined by the Unicode Consortium in some 
version of the Unicode Standard
- something else

--Paul Hoffman, Director
--Internet Mail Consortium

Received on Thursday, 30 November 2000 13:30:49 UTC