Re: Request Routing Information [was: Do we kill the from Amos Jeffries on 2013-02-23 (ietf-http-wg@w3.org from January to March 2013)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Sat, 23 Feb 2013 23:50:40 +1300
To: ietf-http-wg@w3.org
Message-ID: <51289F00.4070605@treenet.co.nz>
On 23/02/2013 7:49 a.m., Nicolas Mailhot wrote:
> Amos Jeffries <squid3@...> writes:
>
>> Client, middlware, and routing infrastructure do not need to care about
>> the path+query portion for their operations other than as an opaque
>> blob.
> Unfortunately not true. We had cases where misbehaving users (that *knew* they
> were misbehaving) changed dynamically the name of the accessed host, and the
> only way to stop the damage was a path match (which fortunately was
> discriminating).

Please explain in more detail. How did they dynamically change the 
accessed host?

And why did your HTTP middleware allow the change?

If you are talking about fiddling the Host: header versus absolute-URL 
versus TCP destination and similar vulnerabilities in the middleware 
there is no excuse for it being vulnerable. I point you at Squid-3.2 and 
the way we prohibit Host and TCP address differing - to the point or 
marking players like Google and Akamai regularly as "forgers".


> And a lot of botnet attacks can be identified by the access to a special path,
> which is the same on all infected servers users access to.

You seem to be misunderstanding the meaning of "opaque". It has nothing 
to do with obscuring anything.

Botnet requests with a consistent ETag prefix for example would be 
equally detectable and preventable using also the same method: a pattern 
match against the relevant field.

I posit that what you are doing there is that _you_ (the human) are 
reading the blob following a URL hostname, _you_ are understanding it, 
and writing a tool that detects a pattern in that field-value. The tool 
itself only needs to determine if the field as a whole matches the 
pattern you gave it.
  If those same botnets were sending urn: with hostname and a path 
segment you would just as easily identify the pattern and have tools 
matching it - even though the "path" segment of URN is an opaque blob 
everywhere except the origin.

There are cases where middleware does need to manipulate the path. But 
these are also the cases where you would be parsing it completely 
anyway, to gain full understanding of all the pieces inside it right 
down to the byte level. That would always be done with a parser which 
re-assembled the pieces and assigned specific meaning to each byte - 
including the query portion.

>
> In all those cases the query portion is just garbage to be ignored, the path –
> not.

This tells me you have not encountered (or noticed) the Spam attacks 
involving query-injection last decade. Lucky you.

/history/
Now dead versions of Outlook used to make magic hyperlinks links on any 
http:// text it detected in plain text by hiding all the text it decided 
was URL and showing only the domain name. The attacker would carefully 
craft login credentials containing encoded @, / and ? in ways which 
outlook would mistake as delimiters but the browser would decode before 
parsing the URL. User and anyone not clued up enough to notice would see 
as a link to a victum website, example.com (or in some cases localhost!) 
which when clicked would go completely sidways to a Phishing or viral 
infected URL on a host somewhere else.

> 'Do not need to care' is another word for 'no creative users'

No. 'Do not need to care' is another word for 'I already have a better 
way to detect those creative users'.

In particular, As has been mentioned already. Splitting those two fields 
will simply give those creative users another tool to play with while 
making the middleware do more work to prevent them using it, um, creatively.

Amos
Received on Saturday, 23 February 2013 10:51:20 UTC