Re: Character encodings in headers [i74][was: Straw-man charter for http-bis] from Keith Moore on 2007-08-20 (ietf-http-wg@w3.org from July to September 2007)

From: Keith Moore <moore@cs.utk.edu>
Date: Mon, 20 Aug 2007 02:56:54 -0400
To: Mark Nottingham <mnot@mnot.net>
CC: Martin Duerst <duerst@it.aoyama.ac.jp>, Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Felix Sasaki <fsasaki@w3.org>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Paul Hoffman <phoffman@imc.org>
Message-ID: <46C93B36.7070503@cs.utk.edu>

Mark Nottingham wrote:
> On 10/06/2007, at 6:05 PM, Martin Duerst wrote:
>> - RFC 2616 prescribes that headers containing non-ASCII have to use
>>   either iso-8859-1 or RFC 2047. This is unnecessarily complex and
>>   not necessarily followed. At the least, new extensions should be
>>   allowed to specify that UTF-8 is used.
>
> My .02;
>
> I'm concerned about allowing UTF-8; it may break existing
> implementations.
concur.  though at least it is possible to distinguish utf-8 from 8859-1. 

also, I'll note that supporting utf-8 in a way that is backward
compatible with existing implementations is almost certainly more
complex (and thus more costly, error-prone, etc) than supporting rfc 2047.
>
> I'd like to see the text just require that the actual character set be
> 8859-1, but to allow individual extensions to nominate encodings
> *like* 2047,without being restricted to it. For example, the encoding
> specified in 3987 is appropriate for URIs. However, it *has* to be
> explicit; I've heard some people read this requirement and think that
> they need to check *every* header for 2047 encoding.
2047 was specifically not intended for use with protocol elements that
have meaning to protocol engines.  how many HTTP headers contain text
that is intended solely for human use?

Received on Monday, 20 August 2007 06:57:38 UTC