Re: CONNECT message including tunneled data from Adrien de Croy on 2008-01-31 (ietf-http-wg@w3.org from January to March 2008)

From: Adrien de Croy <adrien@qbik.com>
Date: Fri, 01 Feb 2008 09:50:22 +1300
To: Jamie Lokier <jamie@shareable.org>
CC: Robert Siemer <Robert.Siemer-httpwg@backsla.sh>, ietf-http-wg@w3.org
Message-ID: <47A2348E.6050807@qbik.com>
Jamie Lokier wrote:
> Adrien de Croy wrote:
>   
>> With CONNECT, the purpose is to establish a TCP connection.  There's no 
>> guarantee what protocol will be used over the connection. 
>>
>> I don't see how having an entity body on request or response can help 
>> the goal of setting up a connection.  It is obviously subsequent data.  
>> In no other place in HTTP do we consider sending entity data for a 
>> subsequent request on an initial message,
>>     
>
> I don't agree that it's "obviously subsequent data".  I think that
> depends on the application and on your point of view.
>   
sure.  My personal POV on it is that the CONNECT command is a shameless 
hack to allow things (often not HTTP UAs) to obtain TCP connectivity 
through an HTTP proxy.

as a hack, it breaks many rules.  First one it breaks is that subsequent 
data on the connection need not even be (and usually isn't) HTTP.  What 
other HTTP method allows you to send any amount of data back and forth 
not delineated into HTTP messages?

In the context of what the command is, it's "connect", purely and simply 
to make a connection and wire it up.  It's not "connect and then pipe 
this data through".  The data on the connection is not in the context of 
the CONNECT message or response.  That data cannot be processed until 
the CONNECT command has been completed, it does not form part of that 
command - therefore it is subsequent data.

I've just got a feeling that if you start allowing pipelined data to be 
piggy-backed onto a CONNECT message or its response, bad things will happen.

Sure, you might save an RTT in some cases, but we need to ensure it 
doesn't break things.

What is being proposed is like having payload data on a TCP SYN packet 
(which is allowed, and supported in some OSes, but I've never seen it 
used) but needs to be considered not part of the SYN command, but an 
attempt at optimisation of what would otherwise be subsequent data 
transmission.

> Consider an app which uses CONNECT to establish a temporary Telnet
> session.  Following CONNECT, the client transmits a single unix shell
> command.  The response is the output of that command, and then the
> connection closes.
>
> Similar patterns are used with other request/response protocols like
> rsync-over-CONNECT (sends a command to fetch a file, gets the file
> back, or sends a command to get a file listing, gets the listing
> back).
>   
Actually I think the biggest use of CONNECT is by spammers to send mail 
(SMTP).  Which is a protocol with many stages, not a simple request 
response.

There are even products that tunnel connections of other apps through an 
HTTP proxy with CONNECT.

> >From one point of view, the CONNECT is a separate operation.
>
> But it's equally reasonable to see the CONNECT as just the initial
> part of an application request, which happens to be wrapped in a
> CONNECT as its mechanism.
>   
We don't split methods in any other case with HTTP do we? (wrt initial 
vs subsequent transmissions).  Also, many protocols start with a server 
welcome.

> The operational consequences of CONNECT that I see are:
>
>    - One extra TCP/IP round trip, because you have to wait for the 2xx
>      response before transmitting the subsequent request data.
>   
yep, just like TCP where you wait for the TCP 3-way handshake before 
sending any data.

>    - Cannot re-use the HTTP connection after the application protocol
>      has finished with it.
>   
that would be impossible anyway - if you wanted to do that you would 
need to apriori know exactly how much data was going to be sent in both 
directions so that you could do proper HTTP message delineation.  In 
some hypothetical cases that might be conceivable, but in real world I 
don't think it's that useful.

In any case the other protocol server is going to close the connection 
once its protocol is done anyway, in which case all you can save here is 
the client connection to the proxy, which is the least expensive part 
normally.

The only way I think you could set a proper HTTP message length on 
something back from the server would be if the server made a 
transmission and closed, and that close was seen by the proxy before it 
had processed the data.

>    - Combination of the above: cannot pipeline multiple application
>      requests, if they need to use separate connections.  (See this
>      already with rsync-over-CONNECT).
>   
pipelining in this case is surely a function of the protocol that is 
tunneled over the connection using CONNECT?

> The logical consequence of CONNECT that I see is:
>
>    - It's one more special case in the HTTP message boundary rules,
>      which was never necessary but it's history is quite understandable.
>   
easier to deploy than a SOCKS server I think, but yes I agree - a 
shameless hack, and in the spirit of shameless hacks, breaks as many 
rules as it sees fit to achieve its goals.

> When the operation effects are significant, it's possible to use POST
> with overlapping request/response to get better performance, which
> seems to be allowed, but only works with known client/proxy/server
> combinations, and of course isn't a standard method.
>
>   
>> and I would be opposed to adding that capability now.
>>     
>
> Certainly I would be opposed to changing CONNECT now - it would break
> everything :-)
>
> [ However, if there's any interest in developing "next generation"
> HTTP (which ought to have gracefully degrading long message
> multiplexing, response reordering, and two-way requests), I would
> suggest that two-way streaming _inside_ messages would be quite a
> natural fit for that. ]
>   
OK.  You could even then go for a multi-connection multiplexed 
connection.  I.e. allow multiple connections to be set up over a single 
client-proxy connection with IDs, and then packets are addressed 
according to those IDs.

Do we see the CONNECT command as being something that is growing in 
popularity though (other than for spammers?).  SOCKS for instance, UPnP, 
various proprietary systems in general provide a much more flexible 
firewall traversal mechanism.

One of the main things about the CONNECT command is its simplicity.  
Changing this in any way I think would reduce its support.

Adrien
> -- Jamie
>
>   

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
Received on Thursday, 31 January 2008 20:49:31 UTC