- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Wed, 02 Jul 1997 18:19:38 +0200 (MET DST)
- To: Ned Freed <Ned.Freed@INNOSOFT.COM>
- Cc: ietf-charsets@INNOSOFT.COM, IETF Languages <ietf-languages@uninett.no>
On Tue, 1 Jul 1997, Ned Freed wrote: > > I thought it was obvious: We currently say that a charset is a mapping from a > > series of octets to a sequence of graphic characters. UTF-8 produces a lot more > > than graphic characters. > > > > I suppose you could argue that US-ASCII does too, but CR and LF are > > specifically dealt with as an exception in MIME, whereas no comparable prose > > exists in MIME to allow, say, directionality indicators. > > A small correction here: MIME part II actually does contain an exception > that allows for directionality indicators as well.I forgot that I added > this at the last minute. > > However, given that Unicode has all sorts of control information in it besides > directionality indicators, there is still a problem. And I don't think having > to revise MIME every time additional sorts of control information are added to > a character set (something the UTC is planning to do) is a good idea. No, it's not a good idea. I think it's fair to say that MIME part II (RFC 2046) does a good job at trying to give examlpes of what is and what is not part of plain text, and that it can be left at that. As an example, the "stacking of several characters in the same position" is allowed. This takes care of cases such as Tibetan, Hebrew and Arabic with points, Thai, and most of decomposed Latin/Greek/... Strictly speaking, it does not take care of character inversion or surrounding such as it occurs in most Brahmi-related scripts in South Asia. But these are neither forbidden, and so it's rather reasonable to assume that they are allowed, because they are just the application of the concept of plain text to these languages/scripts, and the way these scripts have been coded for years. Similarly, zero-width non-joiner can be subsumed by this because it is a very similar concept, in this case for Persian. If the MIME specification would have decided that such things are unacceptable (while stacking is allowed), it would have said so. So as a conclusion, we can say that MIME tries to distinguish between characters useful for plain text and characters/formatting associated to rich text. It does a pretty good job giving explicit examlpes for both, but leaves some area open, so that phenomena unknown to it's authors are not ruled out if they make sense. Given the variety of phenomena that exist in writing, this is a rather sensible approach. Regards, Martin. --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Wednesday, 2 July 1997 09:31:56 UTC