Charset policy

Jonathan Rosenne (rosenne@NetVision.net.il)
Tue, 02 Sep 1997 07:49:09 +0300


Date: Tue, 02 Sep 1997 07:49:09 +0300
From: Jonathan Rosenne <rosenne@NetVision.net.il>
Subject: Charset policy
To: ietf-charsets@INNOSOFT.COM
Cc: mduerst@ifi.unizh.ch (Martin J. Duerst), unicore@unicode.org (Unicore)
Message-id: <3.0.1.32.19970902074909.006a4074@mail.netvision.net.il>

I have two comments:

1. POSIX. There is an inherent contradiction between Unicode and POSIX.
POSIX as it is today allows one to set one's own preferences for several
attributes of the characters, while in Unicode they are fixed.

In the context of a universal character set, I believe the Unicode approach
is the only one feasible.

Other parts of the locale, where cultural preferences such as date format
are specified, should, of course, still remain.

A possible solution is for POSIX to specify that for a UCS certain
character attributes are fixed and are no longer locale dependent.

2. UTF. For some reason I was under the impression that UTF was a temporary
expedient, until the communication protocols are comfortable with 16 or 32
bit characters. The charset policy has it the other way round.

I believe we should be moving faster to just using 16 or 32 bit characters,
now that 7 bit communications are no longer dominant.

To those worried about bandwidth, I will say that most modems today include
compression, and your 8 bit characters are in effect compressed down to 4,
3 and sometimes even 2 bits. 16 or 32 bit characters will not be noticeably
worse off. Maybe the modem compression schemes could later on be adapted to
16 or 32 bit characters and become even better. In any case, the various
UTF schemes are not very impressive as far as compression goes, even for US
ASCII.



--

Jonathan Rosenne
JR Consulting
P O Box 33641, Tel Aviv, Israel
Phone: +972 50 246 522 Fax: +972 9 956 7353
http://ourworld.compuserve.com/homepages/Jonathan_Rosenne/

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Return-path: <~ned+charsets-errors@sigurd.innosoft.com>
Disposition-notification-to: John C Klensin <klensin@mci.net>
Received: from sigurd.innosoft.com ("port 4246"@SIGURD.INNOSOFT.COM)
 by INNOSOFT.COM (PMDF V5.1-10 #8694)
 with ESMTP id <01IN6280DJS694EJB9@INNOSOFT.COM>; Tue, 2 Sep 1997 13:02:18 PDT
Received: from THOR.INNOSOFT.COM (SYSTEM@THOR.INNOSOFT.COM [192.160.253.66])
 by sigurd.innosoft.com (PMDF V5.2-0 #15002)
 id <01IN6263ISXS90NSAR@sigurd.innosoft.com>
 (original mail from klensin@mci.net); Tue, 02 Sep 1997 13:01:23 -0700 (PDT)
Received: from THOR.INNOSOFT.COM (SYSTEM@THOR.INNOSOFT.COM [192.160.253.66])
 by sigurd.innosoft.com (PMDF V5.2-0 #15002)
 with ESMTP id <01IN5P4I2NTA90NOAO@sigurd.innosoft.com>; Tue,
 02 Sep 1997 06:47:40 -0700 (PDT)
Received: from a4.jck.com ("port 1476"@[206.99.215.40])
 by INNOSOFT.COM (PMDF V5.1-10 #8694)
 with ESMTP id <01IN5P41F81Y94EOT2@INNOSOFT.COM>; Tue,
 02 Sep 1997 06:47:16 -0700 (PDT)
Received: from white-box.jck.com ("port 2074"@[206.99.215.34])
 by a4.jck.com (PMDF V5.1-8 #25486) with SMTP id <0EFVVMNDO000HQ@a4.jck.com>;
 Tue, 02 Sep 1997 09:47:12 -0400 (EDT)
Date: Tue, 02 Sep 1997 09:47:10 -0400 (EDT)
From: John C Klensin <klensin@mci.net>
Subject: Re: UTF-8 revision
In-reply-to: <01IN39FRJOLA94E6GE@INNOSOFT.COM>
To: Ned Freed <Ned.Freed@INNOSOFT.COM>
Cc: ietf-charsets@INNOSOFT.COM, Francois Yergeau <yergeau@alis.com>
Reply-to: John C Klensin <klensin@mci.net>
Message-id: <SIMEON.9709020910.C@muahost.mci.net>
MIME-version: 1.0
X-Mailer: Simeon for Windows Version 4.1.2 Build (32)
Content-type: TEXT/PLAIN; CHARSET=US-ASCII
Priority: NORMAL
X-Authentication: none


On Sun, 31 Aug 1997 12:42:27 -0700 (PDT) Ned Freed 
<Ned.Freed@innosoft.com> wrote:
>...
> (2) I think you're going to have a significant problem getting this through
>     the IETF process unless you take a stand on what happens should the
>     character assignments in some future Unicode version change in an
>     incompatible way. Yes, I know that promises have been made that this will
>     never happen again, but that's all they are: Promises. The IETF has a
>     policy that it must retain change control over its own standards, and
>     this is a case where someone else effectively has change control over
>     the actual technical core of this specification. I therefore think that
>     this specification needs to say that it aligns automatically with
>     all future versions of Unicode that don't make incompatible changes, but
>     the minute one is made it stays aligned with the old version until and
>     unless the IETF specifically decides otherwise.

Ned,

While I think that "aligns with future versions of 10646" might 
be acceptable (and even there I see problems, as I don't know 
how to organize a flag day), aligning with future versions of 
Unicode is problematic as I don't think the Unicode consortium 
meets the informal IETF criteria for openness and due process 
that might lead us to partially hand over change control (even 
at that, it would be largely unprecedented). The history has 
been that we reference a particular version of some external 
document and, if the external document changes, we have to 
explicitly go through a new document, review, and last call 
process to change the normative reference.

   john



--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)