W3C home > Mailing lists > Public > ietf-http-wg@w3.org > July to September 2009

Re: p1-message-07 S 7.1.4

From: Adrien de Croy <adrien@qbik.com>
Date: Wed, 22 Jul 2009 10:30:29 +1200
Message-ID: <4A664185.9070807@qbik.com>
To: Jamie Lokier <jamie@shareable.org>
CC: Mark Nottingham <mnot@mnot.net>, Henrik Nordstrom <henrik@henriknordstrom.net>, HTTP Working Group <ietf-http-wg@w3.org>

There are many reasons in here why it's not commercially desirable to 
open many many connections.

I disagree that opening more connections = more bandwidth.  If you are 
hitting a server that allocates bandwidth on a per-connection basis then 
perhaps, but surely that's an issue with the server?  Why don't we 
instead write servers that allocate bandwidth on a client IP basis.  Or 
something else? (cookie?)  Then opening many connections would work 
against you.

There are several reasons download managers make requests in many parts e.g.

a) improved reliability.  You lose less if a single request / response 
fails for some reason
b) improved throughput to an extent if the server is allocating bandwith 
per connection, otherwise reduced throughput due to increased overhead.

Fairness is a political issue. I don't think it should be addressed in a 
technical spec.  I think there are enough real technical and commercial 
reasons to make sure one's software gets along with its neighbours.  But 
I don't believe that the metric of number of connections is necessarily 
the holy grail of what would best be restricted.

We all know what Bill Gates said 20 odd years ago about how much RAM 
anyone could ever need.  I just don't see the need to make the same 
mistake now with how many connections might be considered "too many".  
Obviously in 1996 2 connections was sufficient.  Now it's woefully 
inadequate.  So whose crystal ball do we gaze in to come up with a 
number to last the next 15 years?  I agree that say 100 connections 
feels like too many to me now, but I can't guarantee I'll feel the same 
way in 5 years, and I also believe that anyone opening 100 connections 
now would be disappointed with the results:

* not actually more throughput in many cases less.
* more problems for customers due to impact on the network / other apps
* more support calls / higher support burden / cost

Maybe we'd be better addressing some of the underlying issues another 
way?  If you've a web page with over 100 embedded small (< 2kb) images 
(e.g. http://www.nzherald.co.nz), what's the best way to get those 
images?  It's actually ridiculous when you think about it to spend 3kb 
of request/response headers to retrieve a 1kb image.

So maybe that problem is best handled in HTML specs, allow embedded 
binary elements, still with URIs so they can be cached (and 
revalidated). Even have an alternative HTML page that can be requested 
if you already downloaded the binary-embedded one.  Even if you 
downloaded the resources each time, it's still less expensive to get 
them that way than to recheck freshness with a request. 

Educating authors and site owners about caching could also go a very 
long way, but HTTP/1.1 caching is a complicated subject, which works 
against itself for that reason.  Of the people who could benefit, a 
small number may be motivated, may start reading, then put it in the 
too-hard basket in a very short time.  How many people on this planet 
really know how HTTP/1.1 caching works?  I'd suggest a very small 
fraction of those for whom the knowledge would be desirable.

The argument has also been raised about the relative merits of 
multiplexing multiple virtual connections over a single TCP connection 
vs just opening multiple TCP connections.  There's actually potentially 
less overhead in just opening multiple TCP connections.  It's a tradeoff 
between the framing you'd need in the multiplexing layer (per-block 
overhead) vs the connection setup / teardown overhead (per connection 
overhead) you'd need in the extra TCP connections.  I think if those 
connections were persistent, the multiple TCP connection approach would 
prove better.


Jamie Lokier wrote:
> Adrien de Croy wrote:
>> is this for backbone and infrastructure traffic you mean?
>> In which case, removing spam would be a good start.  Effectively at 
>> least double the width of all pipes.
> Last time I checked, the big backbones kept traffic at about 5% of the
> available bandwidth.  That means 95% unused.  It's for a reason.
> It turns out that as you get closer to filling a pipe, the average and
> peak _delays_ increase enormously and it's not possible to do things
> like internet telephony and video chat...
> In principle, QoS (quality of service) is a whole world of mechanisms
> to permit the full bandwidth of a pipe to be used, while allocating
> other factors such as latency and loss statistics to connections which
> needed.
> But QoS is really very difficult to make work.  It's easier and
> cheaper to have big pipes with 95% spare capacity.
> It's a statistical thing...
>> I would have thought slow-startup algorithms would also work against the 
>> advantage of opening too many connections.
> Not really, because TCP slow-start runs independently for each
> connection.  Slow-start lets you build up the data rate until you hit
> the congestion window where you start getting packet loss.  If you
> have many connections, they all build up into they get packet loss,
> and then the total throughput rate is a similar magnitude to using a
> single connection - except you out-compete other people with fewer
> connections.
> Two of the problems with using lots of TCP connections that I'm aware of:
>   1. As already said, there's an incentive to use more connections
>      than other people to get more bandwidth than other people.  As
>      well as being unfair, it leads to a tragedy of the commons where the
>      network performs poorly for everyone because everyone is trying to
>      compete.
>      That's more or less why TCP slow-start was invented.  Congestion
>      collapse as a result of everyone having an incentive to
>      retransmit their TCP packets too often because they outperformed
>      other people.  With slow-start, they cooperate better.
>   2. With lots of connections, you don't get any more throughput over
>      a link which is all your own.  But you do get slightly worse
>      throughput and worse average delays from the interference between
>      each connection.  The delays are the main reason not to do it
>      over your personal dedicated link, and why HTTP-over-SCTP (and
>      it's poor approximations, pipelining/multiplxing over TCP) would
>      tend to be better than lots of connections.
> Finally, a reason to avoid _lots_ of connections is the same reason why
> we have the TCP slow-start algorithm:
>   3. Congestion collapse.  Same reason you don't let applications
>      "force" TCP to retry packets at high speed.
>> Also, download managers 
>> generally do multiple simultaneous range requests.  The more parts you 
>> request, the more request/response overhead reduces your throughput, so 
>> there's an incentive not to go over the top there as well.
> Roughly speaking, people use download managers which open lots of
> connections to get higher throughput with large files.
> The fact that works at all is an indication that something, somewhere
> is broken.  Under normal circumstances, a single TCP connection will
> attain close to the maximum throughput for a route.  Needing multiple
> connections to download a single file is a sign that you're working
> around artificial limits, such as policy limits or broken link bonding.
> For small things (text-only web pages! :-) where response time is more
> important than throughput, lots of connections has the opposite
> effect.
> But with web pages referencing lots of resources, because of the way
> HTTP does not overlap things, some more connections improves total
> response time.  It still makes the response time for the main HTML
> page longer though!  (But you don't see that).
> -- Jamie

Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
Received on Tuesday, 21 July 2009 22:27:43 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:10:50 UTC