#341: whitespace in request-lines and status-lines from Mark Nottingham on 2012-02-07 (ietf-http-wg@w3.org from January to March 2012)

From: Mark Nottingham <mnot@mnot.net>
Date: Tue, 7 Feb 2012 14:21:37 +1100
To: Amos Jeffries <squid3@treenet.co.nz>
Cc: ietf-http-wg@w3.org
Message-Id: <F09A9DB5-C189-4383-A227-598F533AA82D@mnot.net>

Ticket: <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/341>

Cheers,

On 30/01/2012, at 11:17 AM, Amos Jeffries wrote:

> On 27/01/2012 8:36 a.m., Willy Tarreau wrote:
>> 
>> (...)
>>>>>   When a server listening only for HTTP request messages, or processing
>>>>>   what appears from the start-line to be an HTTP request message,
>>>>>   receives a sequence of octets that does not match the HTTP-message
>>>> Wouldn't "does not *exactly* match" be better ? I'm used to find
>>>> crappy requests in my logs which are blocked but which some not-so-lazy
>>>> implementations would let pass (eg: multiple SP).
>>> "match" means "match"; I don't think there's any ambiguity here...
>> There's no ambiguity, it's just to emphasize on the need to perform
>> strict matching. A large number of HTTP parsers are much too lazy,
>> causing nightmares when trying to filter undesired communications,
>> or even to define new protocol extensions. For instance on my old
>> Apache 1.3 here :
>> 
>>     $ telnet www 60080
>>     Connected to www.
>>     Escape character is '^]'.
>>     HEAD     /           HTTP/1.1     ergeargoaejgoiejgaoeg
>>     Host:   ,,,,
>>     Invalid/header name: blah
>> 
>>     HTTP/1.1 200 OK
>>     Date: Thu, 26 Jan 2012 19:07:02 GMT
>>     Server: Apache
>>     Last-Modified: Mon, 01 Jun 2009 16:47:12 GMT
>>     ETag: "47038-3ad7-46b4c2d81a400"
>>     Accept-Ranges: bytes
>>     Content-Length: 15063
>>     Connection: close
>>     Content-Type: text/html
>> 
>>     Connection closed by foreign host.
>> 
>> "SP" is *one* SP, still multiple SPs are accepted in the request
>> line. Same for forbidden chars in the header name. And I'm not
>> specifically targeting Apache here, I just took the first example
>> I had handy, it's far from being alone. It looks like strchr(),
>> strtok(), sscanf() or split() depending on the language and
>> implementation are common ways to parse requests. This is part
>> of what caused all the mess in the hybi WG, delaying it by one
>> year trying to find solutions against various implementations.
> 
> 
> FWIW: we argued this out in Squid a while back.
> The conclusion was to accept any series of non-wrapping BWS before/after the method and URL. Ignoring the BWS. All other formats and garbage to be treated as HTTP/0.9 mess and 400 the result if the suspected URL(+garbage) fails to parse as a usable URI in its entirety.
> 
> A few vendors have hit it with their SP padding practices so far. But by and large it works.
> 
> AYJ
> 

--
Mark Nottingham   http://www.mnot.net/

Received on Tuesday, 7 February 2012 03:27:17 UTC