- From: John C Klensin <KLENSIN@INFOODS.UNU.EDU>
- Date: Sat, 07 Aug 1993 04:10:26 -0400 (EDT)
- To: luc@OPUS.SPC.NL
- Cc: mohta@NECOM830.CC.TITECH.AC.JP, harald.t.alvestrand@DELAB.SINTEF.NO, ietf-charsets@INNOSOFT.COM, luc@SPC.NL
>I agree with Otha here. Most current protocols do *not* support labeling >(MIME is an exception here, and its designers didn't like it, witness >the excerpt I posted earlier). It would be better to design an encoding >that has *internal* room for extension, in an upward compatible way, >rather then extending the number of encodings. Luc, Be a little bit careful here about "didn't like" and the reasons for the choices. Some of the MIME designers, while recognizing the importance of character set issues, "didn't like" the problems associated with them and simply wished they would go away. As someone who struggled with those issues in the MIME context, I can't say that I blame them. But an important MIME design principle might be stated as "make sure that information needed to decode a message [body part] is available in the right place so that a user or UA can dispatch the body part to the right component/decoder. In the case of text/plain and character sets, that implied external-to-the-text-body labeling so that one could route the body part to something that would "know" those codings and support either the right graphics for the codes or a sensible representation (possibly mnemonic or some extension on quoted-printable) for the unknown ones. The important "2022 is not a character set" principle, in the MIME context, derives primarily from the fact that one cannot deduce from knowing that a text body part is "in 2022" what graphics will be needed and what registrations will occur. "iso-2022-jp" arises precisely because it provides a body-part dispatcher information about the actual formats (escape and switching sequences) used and the codes/registration/graphics that they will require. In this context, and for MIME, the labels that would have been associated with 10646 DIS-1 or -- if one believes in Han unification -- UNICODE or IS 10646 BMP pose no particular problems (although the more paranoid among us might insist on a date as part of the label): These provide Standard character code tables, whose content and required graphics are known at time of publication and revised very infrequently and usually compatibly. But "unrestricted 2022" or "IS 10646 including planes and groups not yet defined" are much harder to label in a useful way since it is difficult, if not impossible, to write a piece of code that can guarantee sensible representations for all of the characters in all files with that particular label. In particular, if such a program is written today, it is unlikely that it will behave optimally (or even reasonably) with a character set registered with ECMA next week or with 10646.x standardized in 1995. Note that I'm trying to state a problem here, not suggesting an answer. But I do suggest that anything that proports to be a definition for "full 10646" in the MIME context must address this issue. john --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Saturday, 7 August 1993 01:12:21 UTC