Re: Character encodings in headers [i74][was: Straw-man charter for http-bis]

The (potential) problem is that an intermediary (for example) needs  
to be able to handle headers that it doesn't understand. If it's been  
built to store headers as iso-8859-1 strings as they pass through (a  
reasonable assumption, considering 2616), an unknown header with  
another encoding -- no matter how specified or flagged -- may break it.

So, going forward, I completely agree with you, but in the case of  
HTTP, I think the horse has already bolted; it is effectively fixed  
to 8859-1, and we can't fix this in the right way without versioning  
the protocol.

Or am I missing something?

On 20/08/2007, at 5:22 PM, John C Klensin wrote:

> Sigh.  My own sense is that, going forward, we need to lose
> 8859-N, not make it the default (or only) character set for more
> protocols.  It is, to put it mildly, a little Euro-centric (and
> not even completely suitable for Europe).  Much of the advantage
> of Unicode is that one does not need to designate/ nominate a
> particular CCS or encoding and then maintain state for it... and
> that is a fairly large advantage.  See also
> draft-klensin-unicode-escapes-03.txt(probably expired, but you
> should be able to find a copy somewhere -- I'll get back to it
> sometime soon) for a discussion of issues in ASCII encoding of
> multioctet character sets.   The IRI spec may constrain things
> to encoding of octets, but that doesn't make it a good idea.
>
> If we are going to consider changes in this area, let's make
> them improvements.  Locking in 8859-1 is not an improvement: it
> would, IMO, be better to deprecate its use and require explicit
> charset designation always if that is the only choice.


--
Mark Nottingham     http://www.mnot.net/

Received on Monday, 20 August 2007 08:55:59 UTC