- From: Harald Tveit Alvestrand <Harald@Alvestrand.no>
- Date: Thu, 16 Dec 1999 22:32:25 +0100
- To: Kenneth Whistler <kenw@sybase.com>
- Cc: ietf-charsets@iana.org, kenw@sybase.com, mark.davis@us.ibm.com
At 10:25 16.12.99 -0800, Kenneth Whistler wrote: > > - Inability to represent characters outside Planes 0-16 > >WG2 and UTC are converging on a point of view that characters >outside of Planes 0-16 should *never* be assigned. This may be >formally written into 10646. The rationale here is that nearly >all 10646 implementations are following the Unicode Standard, by >necessity, to achieve interoperability in areas that are left >unspecified by 10646. Formalizing this convergence by constraining >the code space range that could ever be assigned standard characters >would close down this nagging issue of incompatibility between >the Unicode Standard and 10646. In that case, UTF-8, UTF-16, and >UTF-32 would *all* have the exact same representational capability, >and would all be completely interconvertible forms. See http://www.unicode.org/pending/pending.html It's entirely possible that all commonly used scripts will be encoded in Plane 0 (if those who fight for traditional Chinese and more precomposed characters give up), but I don't think it's likely that ISO will abandon Plane 1. > > - VERY bad expansion factor for characters outside Plane 0 (100% overhead) > >This claim I do not understand at all: > >scalar value UTF-8 UTF-16 UTF-32 >0..7F 1 2 4 >80..7FF 2 2 4 >800..FFFD 3 2 4 >10000..10FFFD 4 4 4 > >The only size advantage for UTF-8 is for ASCII values, and UTF-16 >has the clear size advantage for East Asian data. Yes. My mistake; I didn't count properly. Harald -- Harald Tveit Alvestrand, EDB Maxware, Norway Harald.Alvestrand@edb.maxware.no
Received on Thursday, 16 December 1999 16:38:57 UTC