Re: revised "generic syntax" internet draft from Keld J|rn Simonsen on 1997-04-25 (uri@w3.org from April 1997)

From: Keld J|rn Simonsen <keld@dkuug.dk>
Date: Fri, 25 Apr 1997 14:15:39 +0200
To: John C Klensin <klensin@mci.net>, Edward Cherlin <cherlin@newbie.net>
Cc: uri@bunyip.com
Message-Id: <199704251215.OAA17664@dkuug.dk>

John C Klensin writes:

> (iv) It is not hard to demonstrate that, in the medium to 
> long term, there are some requirements for character set 
> encoding for which Unicode will not suffice and it will be 
> necessary to go to multi-plane 10646 (which is one of 
> several reasons why IETF recommendation documents have 
> fairly consistently pointed to 10646 and not Unicode).  The 
> two are not the same.  In particular, while the comment in 
> (iii) can easily and correctly be rewritten as a UCS-4 
> statement, UTF-8 becomes, IMO, pathological (and its own 
> excuse for compression) when one starts dealing with plane 
> 3 or 4 much less, should we be unlucky enough to get there, 
> plane 200 or so.

Well, there is some kind of compression in 10646, as the BMP is
designed to contain the most frequently used characters in the world,
and characters outside BMP are thus overall meant to be very rarely used
Thus UTF-8 is still an economical encoding of 10646. The major advantage
of UTF-8 is that it is maintaining the ISO 646 (ASCII) encoding and
the control characters in C0 and C1, and thus can provide a straight-
forward migration path for ISO 646 supporting systems.

Keld Simonsen
Liaison from SC2/WG2 to IETF

Received on Friday, 25 April 1997 08:17:41 UTC