Re: RFC 2279 (UTF-8) to Full Standard from Kenneth Whistler on 2002-04-29 (ietf-charsets@w3.org from April to June 2002)

From: Kenneth Whistler <kenw@sybase.com>
Date: Mon, 29 Apr 2002 12:43:13 -0700 (PDT)
To: Dan.Oscarsson@kiconsulting.se
Cc: ietf-charsets@iana.org, tony@att.com
Message-id: <200204291943.MAA20304@birdie.sybase.com>

Dan,

> >And the repeated concerns about the "eventual allocation" of characters
> >in the 32-bit codespace that UTF-16 could not handle have reached
> >the status of urban legends -- endlessly repeated among those in the
> >Linux community who use repetition to define accuracy, without bothering
> >to check with the source.
> 
> I am sure UTF-16 could be expanded with an other surrogate space to
> handle all of original UCS (all 31 bits).

But why? Where is the necessity?

> I general I think is is wrong
> to restrict the available 31 bits of UCS into the UTF-16 space just
> because Unicode did the wrong choice from the beginning by using
> only 16 bits. UTF-8 can encode much more than UTF-16 code space.

This has the lingering quality of a religious or aesthetic argument.
Why is more better when there is no need for more?

If there are no alligators in the sewers, why spend money on
designing alligator traps and installing them in all the manholes?

> Though UTF-16 programs will not be able to handle all of them.
> It is no different from me using a 8-bit code space having to encode
> or discard all character outside code values 0-255.

I presume you meant to write 0-127. ;-)

If there were only 35 characters, and nobody could find any more,
and you were using an 8-bit code space, and the architecture of
the encoding forms limited that to the code values 0-127,
would you feel unnecessarily constrained? Does the "wastefulness"
of throwing away unused bits bother you that much?

Why do you *need* all 31 bits of UCS? For that matter, what about
that wasteful reservation of the 32nd bit in UCS? That eliminates
2 billion+ code values. Why not rail against that restriction, too?
That is actually more reminiscent of the 7-bit/8-bit issue, which
was also a signed/unsigned byte issue.

--Ken

> 
>    Dan
> 
>

Received on Monday, 29 April 2002 15:43:56 UTC