Re: Suggested character set policy for the IETF

Martin J. Duerst (
Fri, 27 Jun 1997 10:54:24 +0200 (MET DST)

Date: Fri, 27 Jun 1997 10:54:24 +0200 (MET DST)
From: "Martin J. Duerst" <>
Subject: Re: Suggested character set policy for the IETF
In-reply-to: <>
To: Chris Newman <Chris.Newman@INNOSOFT.COM>
Cc: ietf-charsets@INNOSOFT.COM, IETF Languages <>
Message-id: <Pine.SUN.3.96.970627103830.286w-100000@enoshima>

On Thu, 26 Jun 1997, Chris Newman wrote:

> On Thu, 26 Jun 1997, Mark Crispin wrote:
> > I am, however, sympathetic to Martin's position.  I agree that "charset"
> > should be the commonly used term, leading to wording such as:
> > 	In this document, the term "character set" (commonly called a
> > 	"charset") refers to the combination of coded character set and
> > 	character encoding scheme.  Non-IETF specifications use the term
> > 	"character set" to refer to the "coded character set", so the
> > 	term "charset" is preferred for the IETF definition.
> > (both CCS and CES should be defined earlier).
> I tend to agree with Mark on this issue, although I don't believe this
> definition is correct.
> A charset in the MIME sense is a mapping from octets to characters and
> related presentation information.

The first part of your definition, "mapping from octets to characters",
is very widely known and used. The second part of the definition, "related
presentation information", is new to me. Is this your own definition,
or where did you find it? What exactly does the term "presetation
information" mean for you? How do you assure that it means the same
thing for others?

> One way of constructing a MIME charset
> is to combine a CCS with an invertible CES (note that a CCS and
> non-invertible CES is *not* a MIME charset).

I don't think the concept of invertibility has to be mentionned
explicitly. It's an obvious detail. What's more important is that
many "charset"s contain more than one CCS. In Mark's definition,
this is not explicit, but due to the absence of an article before
"coded character set", it looks like it's at least not ruled out.
Maybe changing it to "coded character set(s)" would be better.

> While CCS and CES are useful
> concepts for people who build character sets; the MIME charset concept is
> the useful concept when presenting plain text. 

I agree that a definition in terms of "overall effect" is shorter
and better suited to the level of Harald's document, and that
the reader can be referred to RFC 2130 for the decomposing
definition and the other details. But "related presentation
information" hasn't been part of the definition up to now,
and is extremely ambiguous.

Regards,	Martin.

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)