Re: Performance and scalability of Jigsaw from Anselm Baird_Smith on 1997-02-27 (www-jigsaw@w3.org from January to February 1997)

From: Anselm Baird_Smith <abaird@www43.inria.fr>
Date: Thu, 27 Feb 1997 09:27:43 +0100 (MET)
To: francois.deza@sema.fr
Cc: www-jigsaw@w3.org
Message-Id: <199702270827.JAA16893@www43.inria.fr>
Francois Deza writes:
 > Thanks for your reply. We have done some experiments and want to ask
 > new questions.

Thanks a lot for your time...and for the good questions,

 > The logger is indeed a major bottleneck, do you think increasing
 > the w3c.jigsaw.logger.bufferSize property (default 8192 bytes) would
 > alleviate this 
 > issue. The logger of Jeeves seems far less of an issue.

Yes the logger is the bottleneck. I am currently looking at what could
be done to improve it, I have a couple solutions that I am going to
test RSN (I know at least that using a StringBuffer, and String
addition for creating the record is really, really a bad thing to do).

I think for your testing, you should probably disable the logger (for
the same reasons you should not use a Java client), otherwise you
might be measuring the logger speed...

 > When put upon heavy stress (with the right stresser), Jigsaw refuses
 > many
 > connections (the ServerSocket.accept call generates a "Too many file
 > handles open")
 > runtime Exception under Solaris. There are limits on the number of file
 > handles
 > in the kernel and per process. Is there any way to parametrize it in
 > Jigsaw
 > so that we can tune it for heavy load?

Not in Jigsaw, you can however use the ulimit program under UNIX. If
you're root you should raise it "as much as you can", say at least to
twice the number of simultaneous connections you want to handle (see
below the SocketClientFactory tuning too)

 > I have seen that the ServerSocket instantiated in SocketClientFactory
 > has a hardcoded
 > backlog equal to 128. Could you comment on that? Why is it so high?

It should really be customizable, until it becomes so, I have used an
admiteldy high value. Don't forget though, that if you want to handle
say 100 simultaneous connections, this number is probably still
low. Ultimately the default value should be something (I guess) like
(x*numberOfSimConnectionsToHandle) 
where x depends on the CPU of your machine, and on the iregularities
of the # of connection rate (say if you have high peaks of connections
at given times)

 > Could you supply me on guidelines on how to set 
 > the w3c.jigsaw.http.socket.SocketClientFactory properties
 > minfree, maxfree, maxidle, masclients, idleTimeout and maxThreads.
 > I mean they are complex interdependencies between those with respects to
 > the tuning
 > of the performance of Jigsaw.

[skip to the 'example' if this is really unreadable]
Note that this is a really interesting point of Jigsaw, the following
algorithm comes from my mind, with the hope that it will work well in
practice, but if you have better ideas, let me know. Anyway, here is
how it goes (you probably want to have the code handy, l-xx means line
number xx of SocketClientFactory):

The SocketClientFactory has four running states, corresponding to
different "load" modes (l-121):
- AVG_LIGHT: the server can acquire more sockets and can use more
             CPU. In this mode, the server will keep connections open
             for ever, and will accept all new connections.
- AVG_NORMAL: the server should not consume more sockets (ie it has
             already opened near-the-max number of permitted
             sockets). In this mode, the server will try to close
             least-recently used connections before accepting a new
             connection (that it will still always accept).
- AVG_HIGH: this is really the same as the above mode, except that
             "trying" to kill least-recently used connections doesn't
             seem to suffice to cope with the load. In this mode the
             accepting thread priority is made lower than the client
             thread priority (with the hope that request handling gets
             more CPU than accepting new connections)
- AVG_DEAD: we have reached all our resources limit (might be either
             sockets or CPU), in this mode, the server will start
             rejecting connections, and emit appropriate error
             messages in the errlog.

The SocketClientFactory maintains the following variables (here client
really means an instance of SocketClient, not necessarily a thread yet):

- idleCount: the number of clients whose connections is maintained
             persistent, and which are currently waiting for a request.
- freeCount: the number of clients currently unused (ie ready to
             run new connections)
- clientCount: the total number of clients
- maxClients: the maximum number of simultaneous clients.

Switching between the four load modes (l-257) is based on the value of
the above variables and the following "water marks" (they are prefixed
with w3c.jigsaw.http.socket.SocketClientFactory as properties):

When freeCount is lower than maxFree, the server asssumes LIGHT load
(init state). As connections are accepted, freeCount decreases. At
some point it becomes lower than maxFree. maxFree is the "hight water
mark" for the freeCount counter.

Now, if freeCount is still greater than minFree (minFree is the "low
water mark" for freeCount) *and* if the number of idle connections has
not yet reached its maximum the load mode is turned to NORMAL (maxIdle
is the water-mark for idleCount)

If one of the above condition is not true, but we still have clients
ready to accept new connections, load mode is set HIGH, otherwise (no
more free clients), load mode is setto DEAD.

example:

Before I drop all readers, let's take the default config as an example.

The default config assumes 64 max simultaneous connections (which is
the default on solaris). This gives us the value for maxClients:

maxClients=64/2; (one connection requires at least two file handles)

The second step is to decide on a value for maxFree, which controls
the point at which persistent connections are going to be killed when
accepting new connections. I think this depends on the power of the
host, I would recommend that the server should stay in LIGHT mode
until 70% of clients are used

maxFree=0.3*32=10

The third step is to estimate the point at which you will want to
start 'killing' persistent connections:

The first parameter is maxIdle.I would recommend using a 30% margin
(to account for the time between kill and terminate, as explained
above):

maxIdle=0.7*maxClients=22
[the default is pessimisticly set to 20]

The second parameter is minFree. The server will remain in LIGHT mode
until that number of clients is free again. (this is to avoid
switching back and forth LIGHT and NORMAL mode: think of maxFree as
the LIGHT mode exit value and of minFree as the LIGHT mode enter
value):

minFree=0.7*maxFree=7
[again, default setting is pessimistic]

Now, reamins one setting concerning the way clients are mapped to
threads. Since 1.0alpha5 this uses a thread cache which has two config
parameters:

- maxThreads: max number of threads
- idleTimeout: time to keep exceeding threads a live

The pool of threads initally creates maxThreads/2 threads. If more
threads have to be created, they will stay alive after usage, only for
idleTimeout ms.
maxThreads, as you can see only controls the initial size of the
thread pool (it's get 40 in the default config, so that we start with
20 threads). Threads will always be created when needed (l-620). 

I think idleTimeout should be fearly large (at least a few seconds),
because peaks tend to last for "long"...


 > Currently, we are running benchmarks with different values for all those
 > parameters so that we could possible derive tuning rules.
 > An explaination of the rationale behind the algorithms coded in Jigsaw
 > would help a lot. I am referring to the changing priority of the thread
 > accepting
 > sockets, the killing of clients and the server state changes.

I hope the above (long) explanation helps. If you find a better way to
set the parameters, or any enhancements to the way this piece is done,
let me know.

 > To finish, could you clarify the following phenomenon.
 > Upon certain circumstances the stresser in httpd does not return when
 > Jigsaw gets
 > overloaded. I mean certain threads never join. We observed the same for
 > Stresser (in java). This is strange considering than Jigsaw seems to
 > recover in the meantime 
 > from the overload.

I have observed that too, I'll check the problem again.

Anselm.
Received on Thursday, 27 February 1997 03:28:45 UTC