RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed

At 10:25 16.12.99 -0800, Kenneth Whistler wrote:


> > - Inability to represent characters outside Planes 0-16
>
>WG2 and UTC are converging on a point of view that characters
>outside of Planes 0-16 should *never* be assigned. This may be
>formally written into 10646. The rationale here is that nearly
>all 10646 implementations are following the Unicode Standard, by
>necessity, to achieve interoperability in areas that are left
>unspecified by 10646. Formalizing this convergence by constraining
>the code space range that could ever be assigned standard characters
>would close down this nagging issue of incompatibility between
>the Unicode Standard and 10646. In that case, UTF-8, UTF-16, and
>UTF-32 would *all* have the exact same representational capability,
>and would all be completely interconvertible forms.

See http://www.unicode.org/pending/pending.html
It's entirely possible that all commonly used scripts will be encoded in 
Plane 0 (if those who fight for traditional Chinese and more precomposed 
characters give up), but I don't think it's likely that ISO will abandon 
Plane 1.


> > - VERY bad expansion factor for characters outside Plane 0 (100% overhead)
>
>This claim I do not understand at all:
>
>scalar value    UTF-8   UTF-16  UTF-32
>0..7F           1       2       4
>80..7FF 2       2       4
>800..FFFD       3       2       4
>10000..10FFFD   4       4       4
>
>The only size advantage for UTF-8 is for ASCII values, and UTF-16
>has the clear size advantage for East Asian data.

Yes. My mistake; I didn't count properly.

                      Harald

--
Harald Tveit Alvestrand, EDB Maxware, Norway
Harald.Alvestrand@edb.maxware.no

Received on Thursday, 16 December 1999 16:38:57 UTC