Re: Suggested character set policy for the IETF

> > Both because of this definition as well as other interoperability issues the
> > definition the definition of a character set in MIME pretty much has to change.
> > For one thing, registering UTF-8 as a chaset is technicall illegal right now.

> Can you explain that? What's the problem?

I thought it was obvious: We currently say that a charset is a mapping from a
series of octets to a sequence of graphic characters. UTF-8 produces a lot more
than graphic characters.

I suppose you could argue that US-ASCII does too, but CR and LF are
specifically dealt with as an exception in MIME, whereas no comparable prose
exists in MIME to allow, say, directionality indicators.

> I don't think that makes any difference. Quite to the contrary, "control
> character" at least has a long and rather clear usage history, whereas
> "control information" can just be about anything.

No such history exists in the IETF. And I disagree that the history in
other venues is all that clear. You are being highly selective here.

> What I definitely want to avoid, and what I think also the IETF has
> some interest to avoid (even if the danger for the IETF is smaller
> than for Unicode) is that somebody comes and says: 1) A charset is
> defined as containing characters and presentation information,
> 2) presentation information XXX is vital in my application, therefore
> 3) charsets have to contain this information.

> Not really for fonts per se, but in the context of language tags,
> claims along this line have been made.

In other words, you want the definition of charset to exclude the possibility
of language tags, or at least make it hard to get them into a charset
definition. This is not going to happen.

> > However, that doesn't mean it is a valid issue for the IETF. For one thing,
> > history says otherwise. The IETF has had a largely unconotrlled charset
> > registration process in place for well over 5 years now. And a bunch of stuff
> > has been registered which at a minimum should be marked as "unsuitable for use
> > in MIME text/plain". Yet in spite of this chaotic history I am unware of anyone
> > registering a charset that includes, say, general font-switching machinery.
> > (And it isn't like similar machinery doesn't already exist in ANSI X3.4 under
> > the general rubric of "control character", BTW.)

> Well, there is iso-8859-[6|8]-[i|e], which includes bidirectionality.

So now you're arguing that directionality indicators don't belong in a charset?

The point I was making is that we have a fair amount of charset registration
experience under our belts already, and while there have been many problems
with the registration process, the problems you have constantly trotted out in
your messages have never materialized.

> > In other words, while you may believe that the IETF definition of "character"
> > included "control character" all along, a fair number of other people
> > effectively did not and worse, acted on this belief, and worse still, their
> > actions made it into some widely used products. And the result has been serious
> > trouble and serious interoperability problems -- so much so that I had to
> > tighten up the prose in the last go-round on MIME to make it clear that _some_
> > presentation information is present in plain text, when it is there it has to
> > be acted on, and when it isn't nothing should be done. But I didn't fix the
> > definition of "charset" to match this, so we now have a standard that says one
> > thing in one place and another in another place, which isn't acceptable and is
> > going to have to change.

> Nothing against this, not at all. But it's never a bad idea to be safe
> on both sides, i.e. to both say that a minimum of presentation information
> is there and has to be acted upon, and say that this presentation
> information is really only a minimum and not, or at least not necessarily,
> more.

It is a bad idea when your proscriptive approach guards against a fantasy of
your own creation and in so doing causes our work not to meet the stated needs
of the community.

				Ned

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Sunday, 20 July 1997 18:14:24 UTC