W3C home > Mailing lists > Public > ietf-http-wg@w3.org > July to September 2010

Re: HTTPbis -10 drafts published : Connection header

From: Willy Tarreau <w@1wt.eu>
Date: Fri, 16 Jul 2010 06:55:49 +0200
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: Adrien de Croy <adrien@qbik.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20100716045549.GB22518@1wt.eu>
On Thu, Jul 15, 2010 at 04:28:04PM -0700, Roy T. Fielding wrote:
> On Jul 15, 2010, at 12:33 PM, Willy Tarreau wrote:
> > It's not that simple. I have an example of a customer who uses front
> > Apache reverse-proxies to perform security controls and to only
> > let "clean" requests pass through. Those Apaches also add some
> > headers to the requests being forwarded to the servers for logging
> > purposes, and it is the only way to reach the servers. Due to that
> > implementation mistake, it is possible (and I've tested) for the
> > client to make the reverse proxy remove those headers it had just
> > added, so that the end point finally does not get the information
> > it should have unconditionally got.
> 
> Yes, but why is that a problem?  First, the process adding headers
> should have already removed the Connection header received -- otherwise
> it isn't doing its job.  Second, even without fixing that bug, the
> result is fail safe -- the proxy won't be able to forward what it
> generated.

It's not a problem from an HTTP point of view, the request is valid. It's
a problem because some mandatory information which is unconditionally set
by the proxy regardless of the client's request can still be removed by
the client (eg: X-Forwarded-For, Host, X-Forwarded-Host, ...). When the
next hop server takes a decision based on these info, this can become
dangerous (here the main issue was not getting the client's IP in the
server logs). We could for instance imagine an HTTP/1.0 compatible
server which does virtual hosting and which gives access to the base dir
of the virtual servers when no Host was specified. This is just an example,
I'm not saying this is what happens. The issue is mostly that the client
can control some aspects of the other side connection in an unexpected
manner.

Someone else (Adrien ?) suggested it might be problematic with caches.
I'm wondering what can happen if the client does that on Accept-Language
for instance. Sometimes the cache will index the headers from the real
request, but since the server won't get it it may return a different
version of the document, which will be cached associated with different
request headers than what generated the response. Same for the Host
field, eventhough it's less likely that the request will be accepted.

> > Now, if we want to be fair, there are two points here which are
> > causing that issue :
> > 
> >  - apache's header removal does not happen in the appropriate
> >    order.
> 
> The order depends on when the module does its stuff, not on
> something inherent in Apache.  It is the security-checking module's
> responsibility to do the removal earlier (or schedule its additions
> later) if that is desired.

I *think* that x-forwarded-* are managed by the mod_proxy module
itself, though I may be wrong since I don't know apache well enough.
Thus, I'm not sure we could fix this in the config by just moving
LoadModule directives around.

> >  - apache is used as a reverse-proxy (and is often used that
> >    way) but it follows a proxy behaviour instead of a gateway
> >    behaviour. But I suspect that when they began, the difference
> >    was not clear yet between the two.
> 
> It depends on which module is used for that purpose, but yes
> the mod_proxy stuff makes for a poor gateway.  The difference
> was well known at the time -- I should have vetoed the reverse
> proxy features when they were added (they belong in a separate
> module).

I agree.

> It wouldn't be confusing at all if it were not for all the
> extra requirements that gave types to headers (like hop-by-hop).
> They make it sound like there is some header-aware engine going
> through and checking the types.

Yes, indeed, that's how I parse it. My understanding is that as an
implementer, I should read the whole doc, and write down all header
names see at least once, then eliminate from that list the ones
listed as hop-by-hop.

> There is no such thing in a well
> written intermediary -- every decision should be based on a user
> config or the self-descriptive message.

In fact, in my opinion on a real proxy (I mean an outgoing proxy),
it should not be much of a problem if the user can control what
goes out (except maybe for what goes into the cache). I tried to
imagine what could happen if the user sent a GET+content-length+
and got it removed to unveil a second request, etc... so that it
could bypass some mandatory filtering, but I don't see how it
could use that to gain unwanted capabilities.

On the reverse proxy side it's different because what we expect
from a gateway is to perform strong checks before feeding the
servers with safe requests. But here the issue is not caused by
what the spec says (since gateways are not concerned by the rule
on the Connection header), but rather by the use of a component
for the wrong job.

I still just have a few doubts for the side effects on caching proxies,
but I'm not skilled in this area and don't know well what part of the
request may have an effect on what is really cached (eg: Accept-Language,
Byte-Range, ...)

Best regards,
Willy
Received on Friday, 16 July 2010 04:56:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:51:23 GMT