RE: Character Encoding Question from Martin J. Duerst on 2000-11-30 (w3c-ietf-xmldsig@w3.org from October to December 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Fri, 01 Dec 2000 04:40:35 +0900
To: Paul Hoffman / IMC <phoffman@imc.org>, "John Boyer" <jboyer@PureEdge.com>, <w3c-ietf-xmldsig@w3.org>
Message-Id: <4.2.0.58.J.20001201043845.03b413d0@sh.w3.mag.keio.ac.jp>

At 00/11/30 10:30 -0800, Paul Hoffman / IMC wrote:
>At 2:29 AM +0900 12/1/00, Martin J. Duerst wrote:
>>There is no problem with UCS-2 and UCS-4. The UCS is a set
>>(in the math sense) of characters, each with a number associated.
>>There is only one UCS. Just saying 'UCS', there are no assumptions whatsoever
>>about representation (UCS-2 and UCS-4 are both 'charset' labels), and
>>no assumptions about subsetting (UCS-2 can be used, in the right context,
>>to denote a certain subset of the UCS). So I don't see any problem.
>
>I do. :-) "Non-Unicode" is not specific enough to prevent confusion, as 
>this discussion has shown.

'non-unicode' is not part of the wording suggested.


>Does it mean:
>- all charsets except UTF-8, UTF-16, UTF-16BE, and UTF-16LE
>- all charsets except UTF-8, UTF-16, UTF-16BE, UTF-16LE, UCS-2, UCS-4
>- all charsets that are not defined by the Unicode Consortium in some 
>version of the Unicode Standard
>- something else

What we need is all charsets that are defined based on UCS. That would
include any RACE/LACE/..., if they every get defined as a charset,
and is completely independent of who defines it.

Regards,  Martin.

Received on Thursday, 30 November 2000 14:54:32 UTC