Re: p1: whitespace in request-target

Hi,

On Thu, Apr 18, 2013 at 10:49:10AM +1000, Mark Nottingham wrote:
> p1 3.1.1 says:
> 
> > Unfortunately, some user agents fail to properly encode hypertext
> > references that have embedded whitespace, sending the characters directly
> > instead of properly encoding or excluding the disallowed characters.
> > Recipients of an invalid request-line SHOULD respond with either a 400 (Bad
> > Request) error or a 301 (Moved Permanently) redirect with the
> > request-target properly encoded. Recipients SHOULD NOT attempt to
> > autocorrect and then process the request without a redirect, since the
> > invalid request-line might be deliberately crafted to bypass security
> > filters along the request chain.
> 
>   http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.1.1
> 
> I note that the practice of correcting this is fairly widespread; e.g., in
> Squid, the default is to strip the whitespace, and IIRC has been for some
> time:
> 
>   http://www.squid-cache.org/Doc/config/uri_whitespace/

Does Squid log something which helps seeing when it fixes this ?

> I think that the Squid documentation needs to be corrected, because the text
> in RFC2396 (and later in 3986) is about URIs in contexts like books, e-mail
> and so forth, not protocol elements:
> 
>   http://tools.ietf.org/html/rfc3986#appendix-C
> 
> My question is why this is a SHOULD / SHOULD NOT. We say that SHOULD-level
> requirements affect conformance unless there's a documented exception here:
> 
>   http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-2.5
> 
> ... but these requirements don't mention any exceptions. Is the security risk
> here high enough to justify a MUST / MUST NOT? If not, they probably need to
> be downgraded to ought (or an exception needs to be highlighted).

Well, FWIW, haproxy is strict on this and rejects requests which don't
exactly match the expected format, which means that embedded spaces are
rejected with a 400 badreq. At the same time when such a bad request
happens, the whole request is captured. I must say that all the ones
I have seen to date (which are extremely rare) were made by poorly
written attack scripts, or by stupid web scraping tools that resolve
URL-encoding in links found on web pages before doing the request.
These are the same which append '">' at the end of a URL because they
failed to properly parse the "<a href=..." tag.

So in my opinion, we could reasonably use a MUST here. Suggesting recipients
to fix this significantly increases the risk that they do it wrong and become
vulnerable to certain classes of attacks. And we're clearly not contributing
to cleaning the web by accepting such erroneous behaviours, especially if
browsers managed to get it right.

Regards,
Willy

Received on Thursday, 18 April 2013 05:57:52 UTC