Re: PROPOSAL: i74: Encoding for non-ASCII headers from Jamie Lokier on 2008-03-28 (ietf-http-wg@w3.org from January to March 2008)

From: Jamie Lokier <jamie@shareable.org>
Date: Fri, 28 Mar 2008 09:21:39 +0000
To: Mark Nottingham <mnot@mnot.net>
Cc: Robert Brewer <fumanchu@aminus.org>, Martin Duerst <duerst@it.aoyama.ac.jp>, "Roy T. Fielding" <fielding@gbiv.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20080328092138.GC9323@shareable.org>

Mark Nottingham wrote:
> Concretely, our options at this point are:
> 
> 1) Change the character encoding on the wire to UTF-8
> 2) Leave the character encoding on the wire at ISO-8859-1, document  
> existing TEXT instances' encoding requirements on top of that, and
>    a) Require new headers that need i18n content to specify RFC2047, or
>    b) Require new headers that need i18n content to specify *some*  
> encoding into ISO-8859-1 using character escapes (which explicitly MAY  
> be RFC2047).

An issue I have with RFC2047 is it seems to imply every "proper"
implementation of a HTTP reciever, which does something with received
TEXT (such as display it), needs to have a _large_ table of known
character set names and conversion routines.

With email this is unavoidable due to history, but it seems silly for
a HTTP reciever to need it.

Since RFC2047 isn't (currently) seen in practice in HTTP, if RFC2047
continues to be recommended for TEXT, may I suggest that it be
recommended to _only_ designate the "utf-8", "iso-8859-1" and
"us-ascii" character set in RFC2047 encodings in HTTP?

That way, at least, HTTP receivers which aim for a complete,
conformant implementation and expect to do something as simple as,
e.g. decode and show received TEXT, will be complete by just decoding
those character sets.

-- Jamie

Received on Friday, 28 March 2008 09:22:16 UTC