Re: revised "generic syntax" internet draft

Keld J|rn Simonsen (keld@dkuug.dk)
Fri, 25 Apr 1997 14:15:39 +0200


Message-Id: <199704251215.OAA17664@dkuug.dk>
From: keld@dkuug.dk (Keld J|rn Simonsen)
Date: Fri, 25 Apr 1997 14:15:39 +0200
In-Reply-To: John C Klensin <klensin@mci.net>
To: John C Klensin <klensin@mci.net>, Edward Cherlin <cherlin@newbie.net>
Subject: Re: revised "generic syntax" internet draft
Cc: uri@bunyip.com

John C Klensin writes:

> (iv) It is not hard to demonstrate that, in the medium to 
> long term, there are some requirements for character set 
> encoding for which Unicode will not suffice and it will be 
> necessary to go to multi-plane 10646 (which is one of 
> several reasons why IETF recommendation documents have 
> fairly consistently pointed to 10646 and not Unicode).  The 
> two are not the same.  In particular, while the comment in 
> (iii) can easily and correctly be rewritten as a UCS-4 
> statement, UTF-8 becomes, IMO, pathological (and its own 
> excuse for compression) when one starts dealing with plane 
> 3 or 4 much less, should we be unlucky enough to get there, 
> plane 200 or so.

Well, there is some kind of compression in 10646, as the BMP is
designed to contain the most frequently used characters in the world,
and characters outside BMP are thus overall meant to be very rarely used
Thus UTF-8 is still an economical encoding of 10646. The major advantage
of UTF-8 is that it is maintaining the ISO 646 (ASCII) encoding and
the control characters in C0 and C1, and thus can provide a straight-
forward migration path for ISO 646 supporting systems.

Keld Simonsen
Liaison from SC2/WG2 to IETF