- From: John C Klensin <john-ietf@jck.com>
- Date: Mon, 20 Aug 2007 07:22:30 +0000
- To: Mark Nottingham <mnot@mnot.net>, Martin Duerst <duerst@it.aoyama.ac.jp>
- Cc: Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Felix Sasaki <fsasaki@w3.org>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Paul Hoffman <phoffman@imc.org>
--On Monday, 20 August, 2007 13:40 +1000 Mark Nottingham <mnot@mnot.net> wrote: > On 10/06/2007, at 6:05 PM, Martin Duerst wrote: >> - RFC 2616 prescribes that headers containing non-ASCII have >> to use either iso-8859-1 or RFC 2047. This is unnecessarily >> complex and not necessarily followed. At the least, new >> extensions should be allowed to specify that UTF-8 is used. > > My .02; > > I'm concerned about allowing UTF-8; it may break existing > implementations. And whatever is done about it should be consistent with the EAI work. Otherwise, we are likely to find ourselves in big trouble going down the line. > I'd like to see the text just require that the actual > character set be 8859-1, but to allow individual extensions to > nominate encodings *like* 2047,without being restricted to it. > For example, the encoding specified in 3987 is appropriate for > URIs. However, it *has* to be explicit; I've heard some people > read this requirement and think that they need to check > *every* header for 2047 encoding. Sigh. My own sense is that, going forward, we need to lose 8859-N, not make it the default (or only) character set for more protocols. It is, to put it mildly, a little Euro-centric (and not even completely suitable for Europe). Much of the advantage of Unicode is that one does not need to designate/ nominate a particular CCS or encoding and then maintain state for it... and that is a fairly large advantage. See also draft-klensin-unicode-escapes-03.txt(probably expired, but you should be able to find a copy somewhere -- I'll get back to it sometime soon) for a discussion of issues in ASCII encoding of multioctet character sets. The IRI spec may constrain things to encoding of octets, but that doesn't make it a good idea. If we are going to consider changes in this area, let's make them improvements. Locking in 8859-1 is not an improvement: it would, IMO, be better to deprecate its use and require explicit charset designation always if that is the only choice. john
Received on Monday, 20 August 2007 15:38:53 UTC