RE: General policy from John C Klensin on 1993-08-07 (ietf-charsets@w3.org from July to September 1993)

From: John C Klensin <KLENSIN@INFOODS.UNU.EDU>
Date: Sat, 07 Aug 1993 04:10:26 -0400 (EDT)
To: luc@OPUS.SPC.NL
Cc: mohta@NECOM830.CC.TITECH.AC.JP, harald.t.alvestrand@DELAB.SINTEF.NO, ietf-charsets@INNOSOFT.COM, luc@SPC.NL
Message-id: <744711026.956742.KLENSIN@INFOODS.UNU.EDU>

>I agree with Otha here. Most current protocols do *not* support labeling
>(MIME is an exception here, and its designers didn't like it, witness
>the excerpt I posted earlier). It would be better to design an encoding
>that has *internal* room for extension, in an upward compatible way,
>rather then extending the number of encodings.

Luc,

Be a little bit careful here about "didn't like" and the reasons for the
choices.  Some of the MIME designers, while recognizing the importance
of character set issues, "didn't like" the problems associated with them
and simply wished they would go away.  As someone who struggled with
those issues in the MIME context, I can't say that I blame them.

But an important MIME design principle might be stated as "make sure
that information needed to decode a message [body part] is available in
the right place so that a user or UA can dispatch the body part to the
right component/decoder.  In the case of text/plain and character sets,
that implied external-to-the-text-body labeling so that one could route
the body part to something that would "know" those codings and support
either the right graphics for the codes or a sensible representation
(possibly mnemonic or some extension on quoted-printable) for the
unknown ones.

The important "2022 is not a character set" principle, in the MIME
context, derives primarily from the fact that one cannot deduce from
knowing that a text body part is "in 2022" what graphics will be needed
and what registrations will occur.   "iso-2022-jp" arises precisely
because it provides a body-part dispatcher information about the actual
formats (escape and switching sequences) used and the
codes/registration/graphics that they will require.

In this context, and for MIME, the labels that would have been
associated with 10646 DIS-1 or -- if one believes in Han unification -- 
UNICODE or IS 10646 BMP pose no particular problems (although the more
paranoid among us might insist on a date as part of the label): These
provide Standard character code tables, whose content and required
graphics are known at time of publication and revised very infrequently
and usually compatibly.  

But "unrestricted 2022" or "IS 10646 including planes and groups not yet
defined" are much harder to label in a useful way since it is difficult,
if not impossible, to write a piece of code that can guarantee sensible
representations for all of the characters in all files with that
particular label.  In particular, if such a program is written today, it
is unlikely that it will behave optimally (or even reasonably) with a
character set registered with ECMA next week or with 10646.x
standardized in 1995.

Note that I'm trying to state a problem here, not suggesting an answer. 
But I do suggest that anything that proports to be a definition for
"full 10646" in the MIME context must address this issue.

   john

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Saturday, 7 August 1993 01:12:21 UTC