Re: PROPOSAL: i74: Encoding for non-ASCII headers from Stefan Eissing on 2008-03-28 (ietf-http-wg@w3.org from January to March 2008)

From: Stefan Eissing <stefan.eissing@greenbytes.de>
Date: Fri, 28 Mar 2008 11:48:17 +0100
To: Jamie Lokier <jamie@shareable.org>
Cc: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <530A1ADB-8F58-493D-9DF6-8D3B404D2CB3@greenbytes.de>

Am 28.03.2008 um 10:45 schrieb Jamie Lokier:

> Stefan Eissing wrote:
>>> 1) Change the character encoding on the wire to UTF-8
>>
>> -1
>> [...]
> So, in the case of receiving RFC2047 _or_ binary UTF-8, HTTP
> implementations using character strings internally will actually pass
> character sequences which aren't the intended "meaningful" characters,
> except for those in the US-ASCII subset.
>
> In that respect, binary UTF-8 on the wire doesn't change anything from
> the present situation with RFC2047 :-)

You are correct that the information would still be there. And it is  
tempting to shoot for UTF-8.

My personal feeling remains however that there is not enough to be  
gained here for introducing heuristics on char-encoding detection. If  
we follow your scenario, users of a http client API would have to  
check heuristically if the characters received as header values are  
valid UTF-8 sequence when 8859-1 converted back to octets. Next is a  
release of the HTTP client library which would need to do the same on  
the received octets. And everyone will keep their fingers crossed  
that they do not encounter an intermediary that makes some "security  
filtering" on HTTP headers and screws it up.

The atom "slug header" approach makes more sense to me. It keeps HTTP  
header handling less complicated at the cost of a few more octets on  
the wire. And yes, I am all for deprecating anything but ascii in  
header values.

//Stefan

--
<green/>bytes GmbH, Hafenweg 16, D-48155 Münster, Germany
Amtsgericht Münster: HRB5782

Received on Friday, 28 March 2008 10:49:02 UTC