Re: Misc review notes for draft-18 p1 from Amos Jeffries on 2012-01-30 (ietf-http-wg@w3.org from January to March 2012)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Mon, 30 Jan 2012 13:17:32 +1300
To: ietf-http-wg@w3.org
Message-ID: <4F25E19C.9090700@treenet.co.nz>

On 27/01/2012 8:36 a.m., Willy Tarreau wrote:
>
> (...)
>>>>    When a server listening only for HTTP request messages, or processing
>>>>    what appears from the start-line to be an HTTP request message,
>>>>    receives a sequence of octets that does not match the HTTP-message
>>> Wouldn't "does not *exactly* match" be better ? I'm used to find
>>> crappy requests in my logs which are blocked but which some not-so-lazy
>>> implementations would let pass (eg: multiple SP).
>> "match" means "match"; I don't think there's any ambiguity here...
> There's no ambiguity, it's just to emphasize on the need to perform
> strict matching. A large number of HTTP parsers are much too lazy,
> causing nightmares when trying to filter undesired communications,
> or even to define new protocol extensions. For instance on my old
> Apache 1.3 here :
>
>      $ telnet www 60080
>      Connected to www.
>      Escape character is '^]'.
>      HEAD     /           HTTP/1.1     ergeargoaejgoiejgaoeg
>      Host:   ,,,,
>      Invalid/header name: blah
>
>      HTTP/1.1 200 OK
>      Date: Thu, 26 Jan 2012 19:07:02 GMT
>      Server: Apache
>      Last-Modified: Mon, 01 Jun 2009 16:47:12 GMT
>      ETag: "47038-3ad7-46b4c2d81a400"
>      Accept-Ranges: bytes
>      Content-Length: 15063
>      Connection: close
>      Content-Type: text/html
>
>      Connection closed by foreign host.
>
> "SP" is *one* SP, still multiple SPs are accepted in the request
> line. Same for forbidden chars in the header name. And I'm not
> specifically targeting Apache here, I just took the first example
> I had handy, it's far from being alone. It looks like strchr(),
> strtok(), sscanf() or split() depending on the language and
> implementation are common ways to parse requests. This is part
> of what caused all the mess in the hybi WG, delaying it by one
> year trying to find solutions against various implementations.


FWIW: we argued this out in Squid a while back.
The conclusion was to accept any series of non-wrapping BWS before/after 
the method and URL. Ignoring the BWS. All other formats and garbage to 
be treated as HTTP/0.9 mess and 400 the result if the suspected 
URL(+garbage) fails to parse as a usable URI in its entirety.

A few vendors have hit it with their SP padding practices so far. But by 
and large it works.

AYJ

Received on Monday, 30 January 2012 00:18:06 UTC