- From: Chris Newman <Chris.Newman@INNOSOFT.COM>
- Date: Fri, 15 May 1998 10:18:52 -0700 (PDT)
- To: "Martin J. Duerst" <duerst@w3.org>
- Cc: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, ietf-charsets@ISI.EDU, murata@fxis.fujixerox.co.jp, Tatsuo_Kobayashi@justsystem.co.jp
On Fri, 15 May 1998, Martin J. Duerst wrote: > At 12:08 98/05/14 -0700, Chris Newman wrote: > > We might eventually define a MIME "widetext" top-level media type for > > plaintext data using UTF-16 or UCS-4, but I don't think it's time to do > > that yet. UTF-8 is standards track and may be freely used in text/* media > > types. > > Why not? One problem is to find a good name for it, and you just gave > one above, there may be others. For the rest, it's pretty easy. Put > together stuff from HTTP1.1 and from the MIME RFCs. I honestly don't think we have nearly enough experience using Unicode on the Internet for interoperable internationalization. There are lots of issues relating to canonicalization, whitespace, line-ending characters and other things in Unicode which make me very nervous from an interoperability standpoint. I think it's premature to start sending around UTF-16 because it takes all the Unicode-related problems and compounds them by adding a slew of binary and endian-related problems (which have been known to cause trouble in the past). I believe everyone should use UTF-8 for now and once we've got the Unicode-related problems ironed out, then we can start worrying about the binary, backwards-compatibility and endian-related problems UTF-16 will cause later. Ultimately, I want interoperable international characters to become reality, but the more potholes there are on the road today, the more likely people are to turn away. There are other things we could do when we deploy a widetext/* top-level media type. We might want to also deploy a compressing content-transfer-encoding at the same time and prefer UCS-4 over UTF-16 -- we might even be able to skip the UTF-16 step altogether at least for transmission over the Internet. That would be one less interoperability problem. Unicode's "Line Separator" and "Paragraph Separator" codepoints might just work, so in widetext/* we might want to mandate their use instead of CRLF so we really have a canonical cross-platform plain text format. I have no idea if any of this will work and I don't think we have the experience we need to do it right. > Make it so that > widetext/* in the HTTP MIME derivative is equivalent to text/*. The > sooner we do it, the sooner we get rid of the problems with interchange > between HTTP-delivered content and other protocols, and the sooner we > can have full internationalization in email. Email UA implementors > won't have much work on this one, but they have to know what to do. I think it's best just to use UTF-8 in email for now. There is going to be *a lot* of real world opposition to deploying ISO-10646/Unicode in email. I expect UTF-7 to be reviled as much as quoted-printable or RFC 2047, and it was a mistake to promote it. UTF-16 will be an unreadable blob to most email recipients; anyone sending it would rightfully be flamed. I don't want UTF-8/ISO-10646 to lose to our current plethora of character sets in the flurry of opposition which UTF-7 and UTF-16 will create. We have one standards track character set, UTF-8, which will cause the least pain to deploy. Let's promote that until it works, then worry about saving bytes in the encoding. I'm aware this is a bit harsh for our friends with ideographic characters, but I think the outcome will be better in the long term. - Chris --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Friday, 15 May 1998 11:04:35 UTC