Re: NCSA implementation of KeepAlive [server overload issues] from Jeffrey Mogul on 1995-10-20 (ietf-http-wg@w3.org from October to December 1995)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Fri, 20 Oct 95 15:55:53 MDT
To: Beth Frank <efrank@ncsa.uiuc.edu>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9510202255.AA10960@acetes.pa.dec.com>
    Jeffrey Mogul suggested:
    > If we want a mechanism that allows a server to throttle the request
    > rate of an aggressive client, we should address that issue directly.
    > For example (this is just an off-the-cuff idea), the server could
    > return a "try again in N seconds" error code, close the TCP connection,
    > and then refuse to accept a new connection from that client until
    > the N seconds has elapsed. 
    
    Our adminstrator of the www.ncsa site adamently maintains that
    anything the server itself has to do to throttle back requests is
    too much overhead.  If the request gets through the TCP/IP accept
    layer, he believes you're better off just serving the request.

This is an interesting theory, but I don't buy it.  At least not
in the context of setting a fixed limit on the number of requests
per connection.

The reason is that a per-connection request limit cannot protect
the server from a client that simply reopens another connection.
Or a client (such as Netscape) that aggressively opens lots of
TCP connections at once.  Without some explicit signal from the
server to the client that it should slow down, I don't expect
the client to back off.

So if server overload is indeed a problem, then what are the possible
solutions?  Here are the ones I can think of:
    (1) Do nothing
	This is not a particularly useful option!
    (2) Per-connection limit on the number of requests
	As argued above, doesn't seem to work
    (3) Return "try again later" without keeping a refusal list
    (4) Return "try again later", keeping a refusal list
    (5) Use TCP to flow-control the client
If there are other possibilities, please suggest them!  But I don't
think #1 is appropriate, and I don't think that #2 actually solves
anything, so that's why I suggested thinking about #3, #4, or #5.
Note that #5 is orthogonal to the other ones!

Your argument against #3 is basically that the overhead of saying
"no" is more than the overhead of servicing the request.  If we
believe that the client is willing to pay any attention to a
"try again later" error (and we could make this a mandatory
part of the HTTP spec), then I don't buy your argument.

It might be instructive to look at some of the work done by Mary
Baker, while she was a Berkeley, on the server crash recovery
work done for the Sprite distributed file system.  The original
design was "let the clients do what they want", and it turned
out to collapse miserably if the server became overloaded.  Mary's
redesign was based on the principle that ONLY the server could
know when it is overloaded, and thus the server MUST control the
rate at which the clients generate requests. In other words, this
feedback mechanism MUST be built into the protocol.

    I think that searching to see if a site is on a refuse list will
    just further overload the server.

This depends on how carefully one implements it.  Common sense
suggests that the server should start refusing requests before
it has entirely run out of resources, in part so that it can
spend some of its resources policing the congestion-control
mechanism.

I would argue that if the clients properly implement the "try
again later" mechanism, then (1) the server should almost never
have to search the refuse list, and (2) relatively few clients
will be on this list at any one time.  If the list is kept in
a simple tree structure, searches and updates should take
log2(N) time, so one could handle hundreds or thousands of
clients with a very rapid search.

A server implementor is of course free to use the "try again
later" mechanism without keeping a refusal list, if it's too
hard to implement.  Or we might find that all client implementations
obey the "try again later" mechanism, and so it's not necessary
to police things.  I'm simply suggesting this as a way for a server
to further protect itself.
    
Note that this mechanism, with or without a refusal list, is
independent of whether persistent connections are used.

Suppose that we don't trust client implementors to pay attention
to a "try again later" message.  Then I would argue we could still
use TCP's flow control mechanisms to keep aggressive clients at
bay, on the assumption that a client won't start opening extra
TCP connections just because the response on the current one
seems slow.  (This theory may be faulty, in that users may hit
"stop" and "reload" when they get bored.  I do this myself.)

    > Or the server could keep the connection
    > open, but simply delay reading from it (which means that the TCP
    > flow-control mechanism would force the client to stop sending).
    
    This would also agrevate the overload problem, because the server
    would just be that much longer in getting to the next request.

I don't understand this.  The whole point of a congestion-control
mechanism is to defer dealing with some requests.  As long as keeping
an open-but-not-transferring TCP connection does not cost much
in CPU time or memory (and it should not cost that much), this should
be a very effective way to limit the rate at which clients make
requests.

    Think of it as if there are only n-slots for TCP connections and a
    new request from someone else can't be served until one of the
    n-slots frees up.  (This is a very great simplification.)

This misses the point.  If a client has opened one of the N available
TCP connections, and the server has "stalled" that connection in
order to have some time to service one of the other ones, the client
isn't going to try using one of the other N-1 available connections.
This means that other clients are free to use them.  As long as the
server unstalls a connection whenever it has the resources to service
another request, it will always make progress at a near-optimum
rate.

Any reasonable multi-tasking operating system will do this much
more or less automatically.  The somewhat tricky part is to get
the server to service these connections (clients) in a reasonably fair
order, since the operating systems's scheduler may not have enough
information to do this.  I'm not sure that every HTTP server
implementation can solve this (for example, a server that simply
forks for each new connection doesn't have enough central control),
but I'm not arguing that the spec should require anything; simply
that this would be a more effective approach than setting a
per-connection limit on the number of requests.

-Jeff
Received on Friday, 20 October 1995 16:13:57 UTC