- From: Stefan Eissing <stefan.eissing@greenbytes.de>
- Date: Fri, 28 Mar 2008 11:48:17 +0100
- To: Jamie Lokier <jamie@shareable.org>
- Cc: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Am 28.03.2008 um 10:45 schrieb Jamie Lokier: > Stefan Eissing wrote: >>> 1) Change the character encoding on the wire to UTF-8 >> >> -1 >> [...] > So, in the case of receiving RFC2047 _or_ binary UTF-8, HTTP > implementations using character strings internally will actually pass > character sequences which aren't the intended "meaningful" characters, > except for those in the US-ASCII subset. > > In that respect, binary UTF-8 on the wire doesn't change anything from > the present situation with RFC2047 :-) You are correct that the information would still be there. And it is tempting to shoot for UTF-8. My personal feeling remains however that there is not enough to be gained here for introducing heuristics on char-encoding detection. If we follow your scenario, users of a http client API would have to check heuristically if the characters received as header values are valid UTF-8 sequence when 8859-1 converted back to octets. Next is a release of the HTTP client library which would need to do the same on the received octets. And everyone will keep their fingers crossed that they do not encounter an intermediary that makes some "security filtering" on HTTP headers and screws it up. The atom "slug header" approach makes more sense to me. It keeps HTTP header handling less complicated at the cost of a few more octets on the wire. And yes, I am all for deprecating anything but ascii in header values. //Stefan -- <green/>bytes GmbH, Hafenweg 16, D-48155 Münster, Germany Amtsgericht Münster: HRB5782
Received on Friday, 28 March 2008 10:49:02 UTC