Re: HTTP Session Extension draft from Jeffrey Mogul on 1995-07-06 (ietf-http-wg@w3.org from July to September 1995)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Thu, 06 Jul 95 12:03:17 MDT
To: Chuck Shotton <cshotton@biap.com>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9507061903.AA29844@acetes.pa.dec.com>
    >Not necessarily.  My simulations, using traces from several busy servers,
    >show that with certain choices of server parameters, the peak number of
    >TIME_WAIT entries is well below 1000.  That is, the use of sessions
    >can actually reduce the number of TIME_WAIT entries by an order of
    >magnitude (compared to non-session HTTP).  More details in my SIGCOMM
    >paper, or look at
    >    http://www.research.digital.com/wrl/publications/abstracts/95.4.html
    
    Now, how does that relate to the multitude of non-Unix servers on
    the Internet? Please remember that WWW <> Unix and HTTP <> Unix and
    TCP/IP <> Unix. There are LOTS of implementations of WWW software
    that have absolutely nothing to do with Unix, Unix kernel settings,
    Unix kernel performance, Unix IP stacks, or anything else to do
    with Unix. The reason this is an issue is that you cannot predicate
    protocol decisions solely on the implementation of IP stacks on
    Unix hosts.

Please tell me where I used the word "UNIX" in my message.  In fact,
absolutely nothing in my simulations is UNIX-specific.

The requirement that a TCP implementation maintain TIME_WAIT records
is part of the TCP specification.  It is a mandatory requirement on
all TCP implementations.  If you want to argue that non-UNIX servers
should not have to meet the TCP specification, that's another story.
But then the whole process of creating standards is rather pointless,
if that's the case.
    
    You can bet that non-Unix TCP/IP stacks have radically different resource
    constraints and performance issues.
    
That is a valid point.  The HTTP specification should certainly not
require servers to keep connections open any longer than they want to.
A server implementor should keep the relevant resource constraints
in mind when setting server policies.

My simulations examined a range of server policies, including one in
which the server keeps a very small number of connections open.  Read
the paper to see the results.

True, I did not simulate a server that keeps only one connection open
at a time.  Such a server would presumably not want to use sessions,
so I didn't bother to simulate how sessions would affect it.

    >The reason is that the number of TIME_WAIT entries is directly related
    >to the number of TCP connections used.  If you use sessions (what I
    >called in my paper "persistent connections"), you need to create fewer
    >TCP connections for the same number of retrievals.  So you end up
    >with fewer TIME_WAIT entries.
    
    This is irrelevant on platforms with a limited number of TCP/IP streams
    that can be formed. People discussing this issue are right to refer to
    "irresponsible use" of TCP/IP connections.

Nonsense.  The paragraph of mine you quote there has nothing to do with
what platform the server is running.  It is a direct consequence of the
protocol specifications.  If your goal is to minimize the number of TCP
connection records, then use persistent connections.  If your goal is
to minimize the number of open connections, then you may choose not to
use persistent connections (although this is not mandatory; a properly
implemented persistent-connection server should be able to limit the
number of open connections to any chosen limit, without affecting
correctness.)
    
    A related, somewhat important piece to this puzzle is the need for HTTP
    clients to implement a retry scheme when servers report that they are
    resource constrained with a 50x error code. As far as I know, no clients in
    widespread use implement retries when a server reports that it is too busy
    or is unable to service a request due to resource constraints. Implementing
    this portion of the standard in WWW clients will go a long way towards
    eliminating the potential race conditions that can arise when a server
    terminates a session after a client has issued a new request but before the
    server received it. The client will simply retry.
    
On the other hand, for servers that ARE able to maintain multiple TCP
connections, the best way to signal "I'm resource constrained" is to
use the TCP flow control mechanisms.  That is, if the client has an
open connection and the server doesn't have the cycles to keep data
flowing on it, the client is inherently blocked from sending more
requests (unlike the case with current HTTP, where there is no flow
control).  So for server systems whose limiting resource is CPU cycles,
rather than TCP connections, sessions could be a big win.

    >I also suspect that much of the benefit comes NOT from imbedded
    >images, but from subsequent requests for HTML pages (i.e., the
    >user clicks, reads, and clicks again).
    
    The last thing I want to do with a resource-constrained server is
    re-implement the nightmare of hundreds of blinking cursors in otherwise
    idle telnet sessions.

Huh?  Nobody is asking for that.  The server does not have to commit
any CPU cycles to the idle HTTP connections (beyond timer maintenance,
which is actually cheaper on idle open connections than on TIME_WAIT
connections).

    The HTTP protocol is primarily connectionless
    (stateless) for reasons of efficiency from the server's perspective.
    
Statelessness does not imply efficiency.  Period.  Statelessness only
affects the need to maintain state.

I have never seen a quantitative argument that statelessness improves
efficiency of an HTTP server.  In fact, there is ample evidence to the
contrary.  Statelessness costs us in CPU cycles, server memory, packets,
and delay.

HTTP is stateless, as far as I can tell, because it was the simplest
way to get something going, and the original designers didn't know
any better.

    I think a persuasive argument can be made for keeping a stream open
    while all of the required parts of a single "page" are transmitted.
    Allowing an individual to monopolize a scarce resource for longer
    periods of time, on the off chance that a human might select
    another link to your site from the page he just received, IS
    irresponsible.

Nobody has ever argued that HTTP should be modified in a way that
allows an individual to monopolize a resource.  The server has
complete control.

-Jeff
Received on Thursday, 6 July 1995 12:09:39 UTC