- From: Keld J|rn Simonsen <keld@dkuug.dk>
- Date: Fri, 25 Apr 1997 14:15:39 +0200
- To: John C Klensin <klensin@mci.net>, Edward Cherlin <cherlin@newbie.net>
- Cc: uri@bunyip.com
John C Klensin writes: > (iv) It is not hard to demonstrate that, in the medium to > long term, there are some requirements for character set > encoding for which Unicode will not suffice and it will be > necessary to go to multi-plane 10646 (which is one of > several reasons why IETF recommendation documents have > fairly consistently pointed to 10646 and not Unicode). The > two are not the same. In particular, while the comment in > (iii) can easily and correctly be rewritten as a UCS-4 > statement, UTF-8 becomes, IMO, pathological (and its own > excuse for compression) when one starts dealing with plane > 3 or 4 much less, should we be unlucky enough to get there, > plane 200 or so. Well, there is some kind of compression in 10646, as the BMP is designed to contain the most frequently used characters in the world, and characters outside BMP are thus overall meant to be very rarely used Thus UTF-8 is still an economical encoding of 10646. The major advantage of UTF-8 is that it is maintaining the ISO 646 (ASCII) encoding and the control characters in C0 and C1, and thus can provide a straight- forward migration path for ISO 646 supporting systems. Keld Simonsen Liaison from SC2/WG2 to IETF
Received on Friday, 25 April 1997 08:17:41 UTC