Re: Suggested character set policy for the IETF

Martin J. Duerst (
Mon, 21 Jul 1997 10:31:00 +0200 (MET DST)

Date: Mon, 21 Jul 1997 10:31:00 +0200 (MET DST)
From: "Martin J. Duerst" <>
Subject: Re: Suggested character set policy for the IETF
To: Ned Freed <Ned.Freed@INNOSOFT.COM>
Cc: Chris Newman <Chris.Newman@INNOSOFT.COM>, ietf-charsets@INNOSOFT.COM,
Message-id: <Pine.SUN.3.96.970721102542.245f-100000@enoshima>

On Tue, 1 Jul 1997, Ned Freed wrote:

That was a long time ago. I already answered to this mail, also
a long time ago. I have received several other mails in this
discussion that also date back about three weeks. The headers
below suggest that there is a big delay at THOR.INNOSOFT.COM.
This seems to affect all the postings on the IETF charsets
list. If something else is the problem, please tell us what
to do to get it fixed.

Can somebody at Innosoft look into the problem?

Many thanks in advance,		Martin.

> From Mon Jul 21 10:25:33 1997
> Return-Path: <>
> Received: from SIGURD.INNOSOFT.COM by with SMTP (PP) 
>           id <>; Sun, 20 Jul 1997 23:14:58 +0200
> Received: from THOR.INNOSOFT.COM (SYSTEM@[]) 
>           by (PMDF V5.2-0 #8790) 
>           id <> (original mail 
>           from for MDUERST@IFI.UNIZH.CH;
>           Sun, 20 Jul 1997 14:14:23 PDT
> Received: from THOR.INNOSOFT.COM (SYSTEM@[]) 
>           by (PMDF V5.2-0 #8790) with ESMTP 
>           id <>;
>           Tue, 01 Jul 1997 10:39:19 -0700 (PDT)
> Received: from INNOSOFT.COM by INNOSOFT.COM (PMDF V5.1-8 #8694) 
>           id <01IKOEXQX580ANDI74@INNOSOFT.COM>;
>           Tue, 01 Jul 1997 10:38:28 -0700 (PDT)
> Date: Tue, 01 Jul 1997 10:30:16 -0700 (PDT)
> From: Ned Freed <>
> Subject: Re: Suggested character set policy for the IETF
> In-reply-to: "Your message dated Tue, 01 Jul 1997 13:07:47 +0200 (MET DST)" <Pine.SUN.3.96.970701123506.253F-100000@enoshima>
> To: "Martin J. Duerst" <>
> Cc: Ned Freed <Ned.Freed@INNOSOFT.COM>, 
>     Chris Newman <Chris.Newman@INNOSOFT.COM>, ietf-charsets@INNOSOFT.COM, 
>     IETF Languages <>
> MIME-version: 1.0
> Content-type: TEXT/PLAIN; charset=US-ASCII
> > > Both because of this definition as well as other interoperability issues the
> > > definition the definition of a character set in MIME pretty much has to change.
> > > For one thing, registering UTF-8 as a chaset is technicall illegal right now.
> > Can you explain that? What's the problem?
> I thought it was obvious: We currently say that a charset is a mapping from a
> series of octets to a sequence of graphic characters. UTF-8 produces a lot more
> than graphic characters.
> I suppose you could argue that US-ASCII does too, but CR and LF are
> specifically dealt with as an exception in MIME, whereas no comparable prose
> exists in MIME to allow, say, directionality indicators.
> > I don't think that makes any difference. Quite to the contrary, "control
> > character" at least has a long and rather clear usage history, whereas
> > "control information" can just be about anything.
> No such history exists in the IETF. And I disagree that the history in
> other venues is all that clear. You are being highly selective here.
> > What I definitely want to avoid, and what I think also the IETF has
> > some interest to avoid (even if the danger for the IETF is smaller
> > than for Unicode) is that somebody comes and says: 1) A charset is
> > defined as containing characters and presentation information,
> > 2) presentation information XXX is vital in my application, therefore
> > 3) charsets have to contain this information.
> > Not really for fonts per se, but in the context of language tags,
> > claims along this line have been made.
> In other words, you want the definition of charset to exclude the possibility
> of language tags, or at least make it hard to get them into a charset
> definition. This is not going to happen.
> > > However, that doesn't mean it is a valid issue for the IETF. For one thing,
> > > history says otherwise. The IETF has had a largely unconotrlled charset
> > > registration process in place for well over 5 years now. And a bunch of stuff
> > > has been registered which at a minimum should be marked as "unsuitable for use
> > > in MIME text/plain". Yet in spite of this chaotic history I am unware of anyone
> > > registering a charset that includes, say, general font-switching machinery.
> > > (And it isn't like similar machinery doesn't already exist in ANSI X3.4 under
> > > the general rubric of "control character", BTW.)
> > Well, there is iso-8859-[6|8]-[i|e], which includes bidirectionality.
> So now you're arguing that directionality indicators don't belong in a charset?
> The point I was making is that we have a fair amount of charset registration
> experience under our belts already, and while there have been many problems
> with the registration process, the problems you have constantly trotted out in
> your messages have never materialized.
> > > In other words, while you may believe that the IETF definition of "character"
> > > included "control character" all along, a fair number of other people
> > > effectively did not and worse, acted on this belief, and worse still, their
> > > actions made it into some widely used products. And the result has been serious
> > > trouble and serious interoperability problems -- so much so that I had to
> > > tighten up the prose in the last go-round on MIME to make it clear that _some_
> > > presentation information is present in plain text, when it is there it has to
> > > be acted on, and when it isn't nothing should be done. But I didn't fix the
> > > definition of "charset" to match this, so we now have a standard that says one
> > > thing in one place and another in another place, which isn't acceptable and is
> > > going to have to change.
> > Nothing against this, not at all. But it's never a bad idea to be safe
> > on both sides, i.e. to both say that a minimum of presentation information
> > is there and has to be acted upon, and say that this presentation
> > information is really only a minimum and not, or at least not necessarily,
> > more.
> It is a bad idea when your proscriptive approach guards against a fantasy of
> your own creation and in so doing causes our work not to meet the stated needs
> of the community.

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)