Re: NCSA implementation of KeepAlive [server overload issues] from Beth Frank on 1995-10-23 (ietf-http-wg@w3.org from October to December 1995)

From: Beth Frank <efrank@ncsa.uiuc.edu>
Date: Mon, 23 Oct 1995 12:16:32 -0500 (CDT)
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9510231716.AA00377@void.ncsa.uiuc.edu>
>     Jeffrey Mogul suggested:
>     > If we want a mechanism that allows a server to throttle the request
>     > rate of an aggressive client, we should address that issue directly.
>     > For example (this is just an off-the-cuff idea), the server could
>     > return a "try again in N seconds" error code, close the TCP connection,
>     > and then refuse to accept a new connection from that client until
>     > the N seconds has elapsed. 

I respond:
>     Our adminstrator of the www.ncsa site adamently maintains that
>     anything the server itself has to do to throttle back requests is
>     too much overhead.  If the request gets through the TCP/IP accept
>     layer, he believes you're better off just serving the request.

Jeff responds to me: 
> This is an interesting theory, but I don't buy it.  At least not
> in the context of setting a fixed limit on the number of requests
> per connection.
> 
> The reason is that a per-connection request limit cannot protect
> the server from a client that simply reopens another connection.

Our interest is not to prevent the client from getting the data,
it is to force the client to get back in line again, so that service
is divided among those requesting it in a reasonable fashion.

Think of one client with 50 requests, and 49 other clients with
2 or 3 requests.  We server 5 (a number chosen at random) requests
per connection, so the client with 50 requests gets servered 5 of
them, and then he re-enters the queue of requests waiting to be
servered.

> Or a client (such as Netscape) that aggressively opens lots of
> TCP connections at once.  Without some explicit signal from the
> server to the client that it should slow down, I don't expect
> the client to back off.
> 
> So if server overload is indeed a problem, then what are the possible
> solutions?  Here are the ones I can think of:
>     (1) Do nothing
> 	This is not a particularly useful option!
>     (2) Per-connection limit on the number of requests
> 	As argued above, doesn't seem to work
>     (3) Return "try again later" without keeping a refusal list
>     (4) Return "try again later", keeping a refusal list
>     (5) Use TCP to flow-control the client
> If there are other possibilities, please suggest them!  But I don't
> think #1 is appropriate, and I don't think that #2 actually solves
> anything, so that's why I suggested thinking about #3, #4, or #5.
> Note that #5 is orthogonal to the other ones!
> 
> Your argument against #3 is basically that the overhead of saying
> "no" is more than the overhead of servicing the request.  If we
> believe that the client is willing to pay any attention to a
> "try again later" error (and we could make this a mandatory
> part of the HTTP spec), then I don't buy your argument.

The problem is not getting the client to pay attention  to "try
again later" but getting the USER to pay attention to it.  The 
problem is the users who site there hitting the "reload" button.

> It might be instructive to look at some of the work done by Mary
> Baker, while she was a Berkeley, on the server crash recovery
> work done for the Sprite distributed file system.  The original
> design was "let the clients do what they want", and it turned
> out to collapse miserably if the server became overloaded.  Mary's
> redesign was based on the principle that ONLY the server could
> know when it is overloaded, and thus the server MUST control the
> rate at which the clients generate requests. In other words, this
> feedback mechanism MUST be built into the protocol.
> 
>     I think that searching to see if a site is on a refuse list will
>     just further overload the server.
> 
> This depends on how carefully one implements it.  Common sense
> suggests that the server should start refusing requests before
> it has entirely run out of resources, in part so that it can
> spend some of its resources policing the congestion-control
> mechanism.
> 
> I would argue that if the clients properly implement the "try
> again later" mechanism, then (1) the server should almost never
> have to search the refuse list, and (2) relatively few clients

What?  Once a refuse list is implemented every incoming request
would have to be checked against the refuse list.  That means
the server would have to search the refuse list EVERY time.

> will be on this list at any one time.  If the list is kept in
> a simple tree structure, searches and updates should take
> log2(N) time, so one could handle hundreds or thousands of
> clients with a very rapid search.

And it would have to be regularly checked to see that clients
were expired so they could reconnect.

> A server implementor is of course free to use the "try again
> later" mechanism without keeping a refusal list, if it's too
> hard to implement.  Or we might find that all client implementations
> obey the "try again later" mechanism, and so it's not necessary
> to police things.  I'm simply suggesting this as a way for a server
> to further protect itself.

As I said before, the problem is not with the behavior of the
software, but with the people sitting behind it.

> Note that this mechanism, with or without a refusal list, is
> independent of whether persistent connections are used.
> 
> Suppose that we don't trust client implementors to pay attention
> to a "try again later" message.  Then I would argue we could still
> use TCP's flow control mechanisms to keep aggressive clients at
> bay, on the assumption that a client won't start opening extra
> TCP connections just because the response on the current one
> seems slow.  (This theory may be faulty, in that users may hit
> "stop" and "reload" when they get bored.  I do this myself.)
> 
>     > Or the server could keep the connection
>     > open, but simply delay reading from it (which means that the TCP
>     > flow-control mechanism would force the client to stop sending).
>     
>     This would also agrevate the overload problem, because the server
>     would just be that much longer in getting to the next request.
> 
> I don't understand this.  The whole point of a congestion-control
> mechanism is to defer dealing with some requests.  As long as keeping
> an open-but-not-transferring TCP connection does not cost much
> in CPU time or memory (and it should not cost that much), this should
> be a very effective way to limit the rate at which clients make
> requests.

The fallacy in this argument is assuming that CPU and memory are the
limiting factors.  The limiting factors for very busy sites with T1s
or better and our server are the # of child processes available for
servicing a request and the ability of the TCP layer to handle the
incoming request queues.  

The HPs servicing the www.ncsa site are 70-80% idle.  A slow down
in servicing requests occurs when the incoming queue of the TCP layer
starts overflowing and the TCP/IP layer needs to start refusing connections.
The server hits against operating system limits in the TCP/IP layer
and the # of filedescriptors supported by the system.

>     Think of it as if there are only n-slots for TCP connections and a
>     new request from someone else can't be served until one of the
>     n-slots frees up.  (This is a very great simplification.)
> 
> This misses the point.  If a client has opened one of the N available
> TCP connections, and the server has "stalled" that connection in
> order to have some time to service one of the other ones, the client
> isn't going to try using one of the other N-1 available connections.
> This means that other clients are free to use them.  As long as the
> server unstalls a connection whenever it has the resources to service
> another request, it will always make progress at a near-optimum
> rate.

On hoohoo we have predominately 2 kinds of requests: 1) Archie requests
that take a loooooong time, and 2) requests for server documentation 
which are relatively short.  In one early test of the pre-forking server
we found that the short requests were getting stacked up and unserviced
because the children would get stuck servicing all the long requests.
So over the course of 15-20 mins. the requests would look something like:

	Serving		In queue
	sssLsss		sLsLsssL
	ssLLsLs		sL|sLLssL
	sLLLsLL		LssL|sLsLss
	LLLLsLL		sLsLsLss|sLss
	LLLLLLL		LsLsLsssLss|LLsssL
	   ^ no more request are serviced until the inital long
request finally finishes

Keep-Alive increases the number of Long requests and agrevates this
problem.  Putting a limit on the number of requests that can be made
on a single connection mitigates that effect.  We're trying for a
balance between reducing the load of a slow start protocol, and
dividing the limited resources among our clients.  The lowest overhead
way to do this is to tell the client to go get back in line again
(close the connection).

	-Beth

> Any reasonable multi-tasking operating system will do this much
> more or less automatically.  The somewhat tricky part is to get
> the server to service these connections (clients) in a reasonably fair
> order, since the operating systems's scheduler may not have enough
> information to do this.  I'm not sure that every HTTP server
> implementation can solve this (for example, a server that simply
> forks for each new connection doesn't have enough central control),
> but I'm not arguing that the spec should require anything; simply
> that this would be a more effective approach than setting a
> per-connection limit on the number of requests.
> 
> -Jeff

-- 
		Elizabeth(Beth) Frank
		NCSA Server Development Team
		efrank@ncsa.uiuc.edu
Received on Monday, 23 October 1995 10:21:51 UTC