Re: PROPOSAL: i74: Encoding for non-ASCII headers from Jamie Lokier on 2008-03-28 (ietf-http-wg@w3.org from January to March 2008)

From: Jamie Lokier <jamie@shareable.org>
Date: Fri, 28 Mar 2008 14:17:18 +0000
To: Stefan Eissing <stefan.eissing@greenbytes.de>
Cc: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20080328141717.GC16629@shareable.org>

Stefan Eissing wrote:
> 
> Am 28.03.2008 um 10:45 schrieb Jamie Lokier:
> 
> >Stefan Eissing wrote:
> >>>1) Change the character encoding on the wire to UTF-8
> >>
> >>-1
> >>[...]
> >So, in the case of receiving RFC2047 _or_ binary UTF-8, HTTP
> >implementations using character strings internally will actually pass
> >character sequences which aren't the intended "meaningful" characters,
> >except for those in the US-ASCII subset.
> >
> >In that respect, binary UTF-8 on the wire doesn't change anything from
> >the present situation with RFC2047 :-)
> 
> You are correct that the information would still be there. And it is  
> tempting to shoot for UTF-8.
...
> And everyone will keep their fingers crossed  
> that they do not encounter an intermediary that makes some "security  
> filtering" on HTTP headers and screws it up.

I'm thinking the same applies to RFC2047, if that becomes actually
implemented in practice (it currently isn't).

Surely the security issues with RFC2047 decoding among different
implementations are _much_ more likely than those of binary UTF-8?

E.g. =?iso-8859-1?q?=00?= will be a string terminator for some
components of some implementations, rejected by others, and passed
through as ASCII by many.

Expect "security filtering" to have opinions about such sequences for
good reasons, as soon as HTTP recipients start decoding RFC2047 in the
headers and reacting badly to such sequences.

UTF-8 has similar issues, but they are relatively well defined.  With
RFC2047, it's more open ended.

-- Jamie

Received on Friday, 28 March 2008 14:18:00 UTC