Re: #341: whitespace in request-lines and status-lines from Mark Nottingham on 2012-02-13 (ietf-http-wg@w3.org from January to March 2012)

From: Mark Nottingham <mnot@mnot.net>
Date: Mon, 13 Feb 2012 15:10:06 +1100
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <9FEEA58E-C211-4F0A-B74E-847D2784914C@mnot.net>

Looking at this a bit more.

We can't use OWS or BWS here, because they both include obs-fold.

So, proposal:


Add a new construct:

SSP = SP /  1*BSP         ; preferred single space
BSP = ( HTAB / SP )       ; "bad" space

And change Request-Line and Status-Line to:

     Request-Line   = Method SSP request-target SSP HTTP-Version BSP CRLF
     Status-Line = HTTP-Version SSP Status-Code SSP Reason-Phrase BSP CRLF

With appropriate text cautioning against generation of BSP, but advising consumption of it.

Thoughts?


On 07/02/2012, at 2:21 PM, Mark Nottingham wrote:

> Ticket: <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/341>
> 
> Cheers,
> 
> On 30/01/2012, at 11:17 AM, Amos Jeffries wrote:
> 
>> On 27/01/2012 8:36 a.m., Willy Tarreau wrote:
>>> 
>>> (...)
>>>>>>  When a server listening only for HTTP request messages, or processing
>>>>>>  what appears from the start-line to be an HTTP request message,
>>>>>>  receives a sequence of octets that does not match the HTTP-message
>>>>> Wouldn't "does not *exactly* match" be better ? I'm used to find
>>>>> crappy requests in my logs which are blocked but which some not-so-lazy
>>>>> implementations would let pass (eg: multiple SP).
>>>> "match" means "match"; I don't think there's any ambiguity here...
>>> There's no ambiguity, it's just to emphasize on the need to perform
>>> strict matching. A large number of HTTP parsers are much too lazy,
>>> causing nightmares when trying to filter undesired communications,
>>> or even to define new protocol extensions. For instance on my old
>>> Apache 1.3 here :
>>> 
>>>    $ telnet www 60080
>>>    Connected to www.
>>>    Escape character is '^]'.
>>>    HEAD     /           HTTP/1.1     ergeargoaejgoiejgaoeg
>>>    Host:   ,,,,
>>>    Invalid/header name: blah
>>> 
>>>    HTTP/1.1 200 OK
>>>    Date: Thu, 26 Jan 2012 19:07:02 GMT
>>>    Server: Apache
>>>    Last-Modified: Mon, 01 Jun 2009 16:47:12 GMT
>>>    ETag: "47038-3ad7-46b4c2d81a400"
>>>    Accept-Ranges: bytes
>>>    Content-Length: 15063
>>>    Connection: close
>>>    Content-Type: text/html
>>> 
>>>    Connection closed by foreign host.
>>> 
>>> "SP" is *one* SP, still multiple SPs are accepted in the request
>>> line. Same for forbidden chars in the header name. And I'm not
>>> specifically targeting Apache here, I just took the first example
>>> I had handy, it's far from being alone. It looks like strchr(),
>>> strtok(), sscanf() or split() depending on the language and
>>> implementation are common ways to parse requests. This is part
>>> of what caused all the mess in the hybi WG, delaying it by one
>>> year trying to find solutions against various implementations.
>> 
>> 
>> FWIW: we argued this out in Squid a while back.
>> The conclusion was to accept any series of non-wrapping BWS before/after the method and URL. Ignoring the BWS. All other formats and garbage to be treated as HTTP/0.9 mess and 400 the result if the suspected URL(+garbage) fails to parse as a usable URI in its entirety.
>> 
>> A few vendors have hit it with their SP padding practices so far. But by and large it works.
>> 
>> AYJ
>> 
> 
> --
> Mark Nottingham   http://www.mnot.net/
> 
> 
> 
> 

--
Mark Nottingham   http://www.mnot.net/

Received on Monday, 13 February 2012 04:10:34 UTC