RE: CONNECT message including tunneled data from Adrien de Croy on 2008-02-01 (ietf-http-wg@w3.org from January to March 2008)

From: Adrien de Croy <adrien@qbik.com>
Date: Fri, 1 Feb 2008 16:35:44 +1300
To: "'Jamie Lokier'" <jamie@shareable.org>
Cc: "'Robert Siemer'" <Robert.Siemer-httpwg@backsla.sh>, <ietf-http-wg@w3.org>
Message-ID: <000001c86483$86b76b40$27d637d2@qbik.local>
> -----Original Message-----
> From: Jamie Lokier [mailto:jamie@shareable.org]
> Sent: Friday, 1 February 2008 11:50 a.m.
> To: Adrien de Croy
> Cc: Robert Siemer; ietf-http-wg@w3.org
> Subject: Re: CONNECT message including tunneled data
> 
> Adrien de Croy wrote:
> > What other HTTP method allows you to send any amount of data back
> > and forth not delineated into HTTP messages?
> 
> POST :-)

back and forth and back and forth and back and forth etc etc with a
single command and response?

> 
> However, since overlapping request/response doesn't work with many
> agents, it is more common to issue two POSTS (or 1 POST, 1 GET) on
> separate connections, one for the upstream data, and one for the
> downstream data.  People do actually use this method already.
> 

I would have thought it would be easier to use CONNECT :)


> > In the context of what the command is, it's "connect", purely and
simply
> > to make a connection and wire it up.  It's not "connect and then
pipe
> > this data through".  The data on the connection is not in the
context of
> > the CONNECT message or response.  That data cannot be processed
until
> > the CONNECT command has been completed, it does not form part of
that
> > command - therefore it is subsequent data.
> >
> > I've just got a feeling that if you start allowing pipelined data to
be
> > piggy-backed onto a CONNECT message or its response, bad things will
> happen.
> >
> > Sure, you might save an RTT in some cases, but we need to ensure it
> > doesn't break things.
> 
> OH, I'm not advocating changing CONNECT itself.  The method name is
> hard-coded into every proxy; the semantics cannot be changed.  Any new
> strategy would need to use a new method name, at least.

OK, which begs the obvious question - does this belong in HTTP?  We
could re-implement SOCKS5 over HTTP, but why would we?

Sounds like you're proposing a fairly radical departure to the current
target goal of HTTP.  Whilst I'd love to see HTTP move to a more
state-driven multi-transactional multiplexed command protocol, it would
no longer be or bear any resemblance to HTTP.  You'd need to assign a
new port number for it :)

Some of these discussions should be kept in mind when looking to design
HTTP 2.0

> 
> > >   - Cannot re-use the HTTP connection after the application
protocol
> > >     has finished with it.
> > >
> > that would be impossible anyway - if you wanted to do that you would
> > need to apriori know exactly how much data was going to be sent in
both
> > directions so that you could do proper HTTP message delineation.  In
> > some hypothetical cases that might be conceivable, but in real world
I
> > don't think it's that useful.
> 
> Oh, but you _can_ do that already with standard HTTP.  It's not
> complicated.
> 
> Just use chunked encoding over POST.
> 
> I'm not promising it will work with every agent out there, mind. :-)
> But you see the principle is old already.
> 
> > In any case the other protocol server is going to close the
connection
> > once its protocol is done anyway, in which case all you can save
here is
> > the client connection to the proxy, which is the least expensive
part
> > normally.
> 
> Only when the proxy is near the client.  

Sure - that's what I meant by normally.

> When it's near the server,
> the opposite is true.  There certainly are proxies handling CONNECT
> (or the logical equivalent using other methods) which aren't, for
> moderately good reasons.
> 
> > >   - Combination of the above: cannot pipeline multiple application
> > >     requests, if they need to use separate connections.  (See this
> > >     already with rsync-over-CONNECT).
> > >
> > pipelining in this case is surely a function of the protocol that is
> > tunneled over the connection using CONNECT?
> 
> At the moment.  When you embed a stream inside a request with chunked
> or content-length, as you can with POST (and known software), that can
> be a useful way to use HTTP's pipelining to access a non-pipelined
> service.
> 
> Again, I'm not proposing that CONNECT be changed.  Really, just noting
> that experimental protocols are playing with POST and similar
> techniques in this sort of way, that it does work, and it's logical.
> 
> > >[ However, if there's any interest in developing "next generation"
> > >HTTP (which ought to have gracefully degrading long message
> > >multiplexing, response reordering, and two-way requests), I would
> > >suggest that two-way streaming _inside_ messages would be quite a
> > >natural fit for that. ]
> >
> > OK.  You could even then go for a multi-connection multiplexed
> > connection.  I.e. allow multiple connections to be set up over a
single
> > client-proxy connection with IDs, and then packets are addressed
> > according to those IDs.
> 
> Yes, that's what I have in mind when I say "long message
> multiplexing".  However, it comes with its own issues, particularly
> controlling the latency of each stream usefully.
> 
> > Do we see the CONNECT command as being something that is growing in
> > popularity though (other than for spammers?).  SOCKS for instance,
UPnP,
> > various proprietary systems in general provide a much more flexible
> > firewall traversal mechanism.
> 
> Well, HTTP proxies are often available when nothing else is.

Agreed, which places HTTP in a fairly unique position.

> 
> E.g. when I visit some random corporate place, there's sometimes a
> HTTP proxy and no other access to the net.  However, they are usually
> configured only to allow access to port 443 (HTTPS), naturally.  So we
> end up using additional tunnelling layers over port 443 to a
> cooperating server, if we really need to access something else.  So, I
> guess it's not hugely used.
> 

One of the biggest problems with circuit-level proxy protocols (such as
SOCKS, HTTP CONNECT, UPnP, or our winsock redirection protocols etc) is
that defining any useful policy is very difficult.  All the policy
context data you have available is IPs and ports (apart from user/time
of day etc).  It's difficult to set and maintain a useful policy with
only that data, whereas with a gateway that understands the higher level
protocol, you can start restricting more meaningful things, such as the
URL.

What this means is that people who manage circuit level proxies
generally only have a blunt tool with which to limit/restrict/secure
access, and therefore tend to apply rules with not much finesse, such as
blocking anything except dest port 443.  

> I guess CONNECT is sometimes used for HTTPS, but there doesn't seem to
> be much point in that nowadays.  It just pointlessly loads the proxy,
> when routing the connection over a NAT would be more sensible.

Actually it's very common for CONNECT to be used for HTTPS.  Many people
prefer to not provide NAT to all users.  

> 
> > One of the main things about the CONNECT command is its simplicity.
> > Changing this in any way I think would reduce its support.
> 
> I agree and wouldn't advocating changing CONNECT itself for many
reasons.
> 
> -- Jamie
Received on Friday, 1 February 2008 03:34:53 UTC