Re: RFC 2279 (UTF-8) to Full Standard from Dan Oscarsson on 2002-04-15 (ietf-charsets@w3.org from April to June 2002)

From: Dan Oscarsson <Dan.Oscarsson@kiconsulting.se>
Date: Mon, 15 Apr 2002 08:30:33 +0200 (CEST)
To: kenw@sybase.com
Cc: ietf-charsets@iana.org, tony@att.com
Message-id: <200204150630.g3F6UXdU003943@valinor.malmo.trab.se>

>
>And as you can see by my just cited quotation from 10646 itself, such
>argumentation was always a kind of shell game by detractors of UTF-16
>and Unicode. The people making such arguments were not plugged in to
>the process in ISO and were apparently unaware that WG2 itself was
>keenly aware of the interoperability problems and eager to ensure that
>all UTF's for 10646 were *equally* applicable to all characters encoded
>in the standard.
>
>And the repeated concerns about the "eventual allocation" of characters
>in the 32-bit codespace that UTF-16 could not handle have reached
>the status of urban legends -- endlessly repeated among those in the
>Linux community who use repetition to define accuracy, without bothering
>to check with the source.

I am sure UTF-16 could be expanded with an other surrogate space to
handle all of original UCS (all 31 bits). I general I think is is wrong
to restrict the available 31 bits of UCS into the UTF-16 space just
because Unicode did the wrong choice from the beginning by using
only 16 bits. UTF-8 can encode much more than UTF-16 code space.
Though UTF-16 programs will not be able to handle all of them.
It is no different from me using a 8-bit code space having to encode
or discard all character outside code values 0-255.

   Dan

Received on Monday, 29 April 2002 05:00:16 UTC