Re: protocol support for intercepting proxies from Travis Snoozy on 2007-06-18 (ietf-http-wg@w3.org from April to June 2007)

From: Travis Snoozy <ai2097@users.sourceforge.net>
Date: Sun, 17 Jun 2007 19:29:03 -0700
To: Adrien de Croy <adrien@qbik.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20070617192903.12823d09@localhost>
On Mon, 18 Jun 2007 12:03:19 +1200, Adrien de Croy <adrien@qbik.com>
wrote:
<snip>
> > What I'm getting from the first part is: admins don't want to
> > configure the browser. The second part, though, seems to say admins
> > want to configure the browser. Isn't this a little conflicted?
> >
> Admins don't want to individually configure thousands of individual 
> browsers.
> 
> They wish to configure browsers centrally with a single central 
> configuration setting.

And users wish to have control over their own browsers. Unfortunately,
users on average are also fairly un-savvy when it comes to making good
security decisions. Providing a pop-up that lets the user decide is not
really an answer -- most of the time, the poor user just wants it to
work, and will click "yes" to anything that gets in the way between
them and that goal.

> largely, but there are some significant warts on it.
> 
> WinGate for example does the hideous juggling act of intercepting 
> connections, running NTLM auth over them, then allowing the same
> request to go through to an origin server that then also requires
> NTLM auth.
>
> But it's not pretty, or particularly robust.  There are browser
> variations.

I'm not even going to touch that; this e-mail is already too long. :)
 
> > Also,
> > one should speak about authenticating to the _network_, not the
> > proxy -- the proxy simply provides the service for authentication.
> > It's not an "HTTP proxy" per se, so much as a TCP/IP filtering
> > application that happens to speak HTTP + HTML, because most users
> > have something that can speak HTTP + HTML. Actual HTTP-level
> > filtering and/or caching is another story altogether, but I'm
> > sticking with simple authentication for now.
> lost me there.

Alice hops onto the network. Bob, the network operator, needs to verify
that the person who hopped on the network (Alice) is actually a valid,
paid-up user, and not some cad trying to bum bandwidth (Malorie). So,
Bob denies nearly all services to folks who hop on the network until
they've authenticated somehow. The "somehow" could be anything, but it
should be done at the lowest protocol level available (e.g., dial-up
connections use PPP for link-layer authentication; wireless connections
are also authenticated at the link-layer via WPA+RADIUS; SOCKS, SSH, IP
over IPSec, and other VPN-like technologies can be used for
network-layer authentication).

HTTP is a bad way to do this, but it's used because anybody with a
browser can understand HTTP + HTML. Network authentication, however, is
a *separate* problem from providing caches. Caches introduced here
would be tantamount to trying to cache TCP/IP, which is utterly silly.
The fact that a single application can combine the two functions is
understandably confusing, and it's easy to get things mixed and jumbled
in such a way as to cause havoc.

> >
> > <snip>
> >> Given that the problem is not going to go away because people are
> >> not going to want to stop using intercepting proxies, wouldn't it
> >> be better if there was some proper protocol support for the
> >> concept?
> >
> > Yes, but what about backwards compatibility? The proxy still needs a
> > way to let browsers that *don't* implement such extensions to
> > authenticate and work properly. 
> if a 400 series code came back from an intercepting proxy, with a
> page saying "you need to configure your browser to use a proxy", plus
> a header field with the URI of the proxy, if the client trusted the
> source of this message, a compliant client could automatically retry
> the request to the proxy, even ask the user if they wish to set their 
> browser to use this proxy for future requests. A non-compliant
> browser would show the message.

Yes, but the automatic configuration is just ripe for abuse. I mean,
there's a reason why nobody uses 305, and it's because you'd have to be
nuts to let some arbitrary server on the Internet tell you to route
yourself through some other arbitrary server.

DHCP is relatively safe -- it's broadcast at the link-layer, and isn't
generally routed (and when it is, it's done in a conscious manner).
That makes it great for telling you which proxy to use, granted that
you trust your local network. Browsers use HTTP over TCP/IP, which is
Internet-routable by definition -- because of this, allowing proxy
redirects would be a Very Bad Thing. All it takes is a DNS hijack of one
domain you visit (or even unintentionally visiting one bad domain) along
with an uninformed "yes" click to have all your traffic routed through a
proxy.

Easy configuration is good, but easy/auto configuration should _always_
be limited to the local network ("local" being somewhat relative, here).

> > We wind up with a chicken/egg problem,
> > and we still have to solve the original issue with the existing
> > infrastructure, regardless.
> >
> >> UAs at the moment don't generally know if their connections are
> >> being intercepted.  If they knew, then they could;
> >> * let the user know connections were being intercepted
> >>     - ameliorates issues relating to privacy
> >
> > So long as the proxy-operator wants them to know, and is a decent
> > human being, and the software supports it .
> 
> or is forced by privacy legislation to do so.  Several countries I
> know of have quite advanced privacy regulations concerning internet
> traffic, i.e. Italy.

Yes, but law does not protection make. Fraud is illegal, but
perpetrated daily. Compromising computers, e.g., for botnets, is
similarly illegal. XSS attacks to perpetrate phishing is illegal. Not
putting a proper Via: entry could be made illegal, but it won't stop
the badness (hijacking sessions by re-routing browsers through
arbitrary proxies) from actually happening.

> >
> >>     - helps users decipher errors better (i.e. upstream connection
> >> failure)
> >
> > A good error message from the proxy should be adequate for this
> > ("500 failed to connect to server at example.org"). Alternately,
> > one could pass the TCP/IP issues directly through (e.g., if the
> > connection timed out, let it time out on the client; 
> 
> You can't time out at the connection phase if you already moved past 
> that phase by accepting the connection.

We could if we were tunneling TCP/IP over HTTP, but we're not (that's
part of the problem with trying to use this method for network
authentication).

> The whole advantage of caching intercepting proxies is to avoid 
> connecting to the upstream origin server if possible.  So proxies
> have to accept the connection before the client will send the
> request, which the proxy needs to check the cache by which stage it's
> too late if the upstream connection fails.  All that can be done at
> that stage is present an error page to the client, at which stage
> they know immediately that their connection is being intercepted (if
> they are aware of the significance of such things).

The benefit you stated here was that a UA knowing it is being
intercepted "helps users decipher errors better." That presupposes
that there is no secrecy, or attempt at secrecy. My question was, how
does the UA knowing interception is happening have any bearing on the
quality of the error message returned? Unless the UA and proxy are
trying to conspire to keep the user from figuring out there's a proxy
(which I don't think is what you're looking for), I don't see why an
appropriate/descriptive 5xx code wouldn't do.

> 
> > if it was reset, reset it, etc.).
> > What about identifying the proxy with the client would help?
> 
> You'd definitely want some mechanism to assist the human to make the 
> decision about whether or not to use the proxy.

... who's going to provide that information? The proxy? I'm supposed to
make a security decision based on the word of the proxy that's trying
to hijack my connection? That's just about as bad as Vista's UAC, or
(more topically) SSL certificate popups. The average user will scratch
his head for a moment, figure that he has to click "yes" to continue,
and be on his merry way. Security freaks like myself enjoy the
opportunity to stop badness from happening, but the average user just
wants the darned thing to connect.

> 
> >
> >>     - leads towards possible user-control over whether their
> >> traffic may be intercepted or not
> >
> > See prior comment about proxy op being a decent human being. Also,
> > the options in this scenario are go through the forced proxy, or
> > don't get Internet access -- a security policy wouldn't be very
> > helpful. Users should *always* assume their traffic is monitored
> > (esp. on business & school networks, where this type of proxy is
> > likely to occur), and vary their browsing habits based on that
> > assumption. Users interested in getting around eavesdroppers should
> > already be using technologies like Tor, VPNs, anonymizing SOCKS
> > proxies, etc.
> Actually I can think of scenarios where this would be useful.
> 
> For instance our ISP intercepts all connections, but allows customers
> to opt-out of this (which we had to do in order to perform useful
> testing of WinGate).
> 
> The opt-out process of the ISP is something that consumes their 
> resources.  A mechanism to allow a customer to opt-out would save the 
> ISP resources.
> 
> Obviously this would be a configuration option, whether or not to
> allow users to opt-out or not (off by default for corporate proxies,
> possibly on for ISPs).
> 
> Most corporate gateways won't allow such simple bypassing of HTTP 
> policy, and will block VPN, SOCKS etc.

That sounds like an IP/routing level change, not an HTTP-level one. The
flow would have to be something like...

User: I want to connect to example.com!
Proxy: Uh... you're going through me. That all right?
User: No!
Proxy: All right; I'll poke the backend so that your packets aren't
routed through me anymore.
User: ... I want to connect to example.com!
Example.com: Welcome!
...

Now, granted, the proxy could switch to transparent mode instead, but
that wouldn't _really_ be taking the proxy out of the equation. So,
yes, it would provide the proxy with some mechanism to make decisions,
but that type of thing could already be available in the form of other
standard technologies[1].

> >
> >> * cooperate better with the proxy.
> >>     - move to a proxy-oriented protocol operation (can fix many
> >> issues, such as auth)
> >
> > Yet another proxy-discovery technology -- but why? How are legacy
> > browsers going to cope?
> 
> why: because current ones rely on out-of-service solutions (i.e.
> DHCP / DNS).  It's possible to solve the problem completely in HTTP.
> One stop shop.
> 
> Legacy browsers will cope as per mechanisms above.  I'm not proposing 
> deprecating these other methods, although I think they would fall
> from grace with a decent HTTP implementation.
> 
> >
> >>     - deal with errors differently
> >
> > Examples?
> 
> classic one being the upstream connection failure.

500: Your request was routed through a proxy, and that proxy could not
contact the server. Please contact <ISP@example.org> if you have
further questions.

What's wrong with this, unless you're trying to _hide_ the fact that
there's a proxy?

> >
> >> I believe that this could be achieved with either a status code,
> >> or a header, where an intercepting proxy could signal to a client
> >> that it intercepted the connection.  The proxy could even intercept
> >> connections and enforce the UAs to use a proxy method, provide a
> >> proxy URI in the response that the UA must use.
> >
> > So, instead of the admins putting the infrastructure in to allow
> > auto-config, we'll force the end-users to do it themselves? That
> > seems kind of backwards.
> 
> not end-users, UAs.
> 
> Don't forget that it's a configuration option on a browser whether or 
> not to use proxy auto detection as well.

Yes it is, but that's a single checkbox. All the legwork is done by the
admins (as it _ought_ to be), and the technology is fairly well locked
to the local (assumed trusted) network (see comments on DHCP).

> >
> >> This is another case where 305 would/could have been useful.
> >> Another option would be a warning code to indicate the connection
> >> had been intercepted.  I believe system administrators would wish
> >> to be able to configure how to deal with the case from a number of
> >> options including
> >>
> >> 1. Allow the clients to operate through the intercepting proxy
> >>     - with notification
> >>     - silently
> >> 2. Force the clients to re-connect to the proxy and issue requests
> >> with proxy semantics.
> >
> > Why bother? What would either of these points gain us?
> 
> reduced burden on admins and users.
> 
> the links in the chain that can break for WPAD.
> 
> 1. Browser not configured to use proxy auto detect

- Browser doesn't support these extensions, and needs to be configured
manually to use proxy

> 2. Client DNS issues

- If your client has DNS issues, they're kinda screwed net-wise
anyways. DHCP (another requisite for WPAD) should take care of this.

> 3. DNS server not providing decent records for WPAD lookups

- If you have bad proxies, transparently routing to them won't solve
anything over using DNS to point at them.

> 4. WPAD URL not working (web server serving WPAD files not configured 
> properly)

- Proxies and routers can be likewise misconfigured.

> 5. Some clients use DHCP option 252 for WPAD, not DNS.

- No clients support the proposed new mechanism, and (given the track
record on *other* HTTP features) support will be spotty and of
questionable interoperability.

> 6. DHCP implementation and configuration issues

- Examples?

> There are quite a few links that can break, several of which are 
> client-side. On a large network, supporting all this can be a huge 
> burden on admins.
> 
> So, to solve the issue within HTTP would seem sensible?

Except that there will still be all the client-side issues with the
"old clients". I mean, the best thing to do would seem to be to put up
a page that instructs users on how to configure their web browser to
use a proxy.

> >
> > Some solid examples/scenarios would be really handy for illustrating
> > the issues you're coming up against. I can vaguely see where you're
> > going (drop a browser on the network, have it auto-configure by
> > virtue of simply getting routed through the proxy, with no extra
> > setup), but I don't see any really big win, especially when other
> > technologies (zeroconf, UPnP, etc.) have been explicitly written to
> > solve the problem generically.
> >
> UPnP is often seen as a security nightmare and turned off.  I don't
> know much about zeroconf.

UPnP is a security nightmare only insofar as allowing users to
automagically poke holes in "gateway devices" (read: NAT router
and/or firewall). It's all broadcast-based, so everything stays on the
local network, like DHCP.

> the main win comes in sys admin work load, see above

So build it all into your product and put an easy-to-use interface on
it. It's value-add. In any case, all this configuration should be more
or less one-time-only. Once it's set up, everything should just keep
working -- until you need to add a new proxy, or reconfigure the setup.
And even then, given that the procedure is well-documented, it
shouldn't be that much added work.

My biggest issue with using existing proxy auto-detection is that WPAD
is just an I-D; there is no standard. Implementation-wise, it's
de-facto a standard -- but I can't comment on the interoperability of
the implementations. So, yeah, the problem needs to be solved, but
browsers technically don't have a standard way of doing it. That said,
any solution that _does_ come up is likely going to be DHCP and/or DNS
based, and locked to the local network (a la zeroconf DNS Service
Discovery).

My second biggest issue is the implied OSI mashing. HTTP is an
application-level protocol, and it should stay that way. Trying to use
it for network authentication is just plain wrong, unless you're going
to actually tunnel a real network protocol over it (in which case, you
can't do it in-browser anymore and may as well use IPSec instead). It
works good-enough in basic cases, but it is NOT the solution that an
ISP should be using. AOL is a perfect example of the very scenario you
describe -- they distribute their own CD, with a browser tweaked with
the required proxy server settings. Now, AOL is not the shining model
of good business practices, but they do seem to have surmounted this
technical problem with relative ease. It's not like there's any
shortage of those damn CDs, so they can't be _that_ hard to make. ;)


-- 
Travis

[1] http://www.w3.org/P3P/
Received on Monday, 18 June 2007 02:29:12 UTC