RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed

> De: Harald Tveit Alvestrand [mailto:Harald@Alvestrand.no]
> Date: jeudi 16 décembre 1999 10:24
>
> My list of disadvantages:
>
> - No compatibility with cstrings due to NULL

Ah, this is what I had in mind with my "C string and ASCII-thinking".  It's
real, but of limited scope.  There's a corresponding advantage when you
think "Java and Unicode" instead of "C and ASCII".  In other words, it's not
an intrinsic disadvantage or advantage, it depends on the programming
environment.

> - Inability to represent characters outside Planes 0-16

Real but very theoretical.  In addition to what he wrote, Ken Whistler
should have mentionned his calculation of when those planes will fill up if
we maintain the current allocation rate (unlikely, most things of import are
already done or in the pipeline).  I don't remember the target date offhand,
but I think it was 23rd century.

> - VERY bad expansion factor for characters outside Plane 0
> (100% overhead)

Oops!  There's not much size difference between a 4-byte UTF-8 character and
a 4-byte UTF-16 character.

> - No ability to mix ASCII and UTF-16 elements in a simple viewer

It's no harder to mix ASCII with UTF-16 than to mix ASCII with any other
charset.  You just transcode to your favorite encoding of Unicode and you're
done.

> - Two incompatible byte orders

That one is really, really real.

> My list of advantages:
>
> - Does not require conversion between UCS-2 and UTF-16 when
> only Plane 0
>    characters are used in the UTF-16

Plus the size advantage in plain text for many languages, some of them
important in practice.  As Ken pointed out, UTF-8 is denser only for ASCII.


That said, I will concur with Martin, Harald and Ira: the important thing at
this time is to get this out the door, not whether it's Info or PS.

--
François Yergeau

Received on Thursday, 16 December 1999 15:50:01 UTC