Re: Character encodings in headers [i74][was: Straw-man charter for http-bis] from Mark Nottingham on 2007-08-20 (ietf-http-wg@w3.org from July to September 2007)

From: Mark Nottingham <mnot@mnot.net>
Date: Mon, 20 Aug 2007 18:55:10 +1000
To: John C Klensin <john-ietf@jck.com>
Cc: Martin Duerst <duerst@it.aoyama.ac.jp>, Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Felix Sasaki <fsasaki@w3.org>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Paul Hoffman <phoffman@imc.org>
Message-Id: <6B8E3D7A-71B8-4B8D-9625-2AB3C74A9072@mnot.net>

The (potential) problem is that an intermediary (for example) needs  
to be able to handle headers that it doesn't understand. If it's been  
built to store headers as iso-8859-1 strings as they pass through (a  
reasonable assumption, considering 2616), an unknown header with  
another encoding -- no matter how specified or flagged -- may break it.

So, going forward, I completely agree with you, but in the case of  
HTTP, I think the horse has already bolted; it is effectively fixed  
to 8859-1, and we can't fix this in the right way without versioning  
the protocol.

Or am I missing something?

On 20/08/2007, at 5:22 PM, John C Klensin wrote:

> Sigh.  My own sense is that, going forward, we need to lose
> 8859-N, not make it the default (or only) character set for more
> protocols.  It is, to put it mildly, a little Euro-centric (and
> not even completely suitable for Europe).  Much of the advantage
> of Unicode is that one does not need to designate/ nominate a
> particular CCS or encoding and then maintain state for it... and
> that is a fairly large advantage.  See also
> draft-klensin-unicode-escapes-03.txt(probably expired, but you
> should be able to find a copy somewhere -- I'll get back to it
> sometime soon) for a discussion of issues in ASCII encoding of
> multioctet character sets.   The IRI spec may constrain things
> to encoding of octets, but that doesn't make it a good idea.
>
> If we are going to consider changes in this area, let's make
> them improvements.  Locking in 8859-1 is not an improvement: it
> would, IMO, be better to deprecate its use and require explicit
> charset designation always if that is the only choice.

--
Mark Nottingham     http://www.mnot.net/

Received on Monday, 20 August 2007 08:55:59 UTC