Re: Performance and scalability of Jigsaw from Francois Deza on 1997-02-27 (www-jigsaw@w3.org from January to February 1997)

From: Francois Deza <francois.deza@sema.fr>
Date: Thu, 27 Feb 1997 22:53:02 +0100
To: Anselm Baird-Smith <abaird@w3.org>
CC: www-jigsaw@w3.org
Message-ID: <3316023D.5419@sema.fr>
Anselm,

Good to have you reply in such interesting details.
I have two other questions.
We have noticed that the busyCount instance variable
, although defined, is never used in jigsaw?
Is it normal?

Do you know if JDK 1.1 implements the keepalive http 1.1 feature?
We have observed that they are more requests sent by the Stresser than
they are
sockets accepted by Jigsaw. It is as if the keepalive feature was
present.

To finish, I can confirm there is something strange happening to Jigsaw
upon heavy load.
Certain requests get blocked forever.
We observed that in the Stresser utility some threads block forever
on the read. When we kill Jigsaw, sockets get closed and exceptions get
generated
by the blocking read in the Stresser. This unblocks the Stresser which
then returns.
Same for httpd.

This is very strange especially if you know that Jigsaw occupies 0% of
the CPU
at that time. It is as if something was blocking the requests in Jigsaw.
We are tracking it down. I can at least say it is not in the write
happening on Jigsaw.
When threads enter the method in which the write takes place, they
always finish
the job. This is something before. 

Can you comment?


Francois

Anselm Baird_Smith wrote:
> 
> Francois Deza writes:
>  > Thanks for your reply. We have done some experiments and want to ask
>  > new questions.
> 
> Thanks a lot for your time...and for the good questions,
> 
>  > The logger is indeed a major bottleneck, do you think increasing
>  > the w3c.jigsaw.logger.bufferSize property (default 8192 bytes) would
>  > alleviate this
>  > issue. The logger of Jeeves seems far less of an issue.
> 
> Yes the logger is the bottleneck. I am currently looking at what could
> be done to improve it, I have a couple solutions that I am going to
> test RSN (I know at least that using a StringBuffer, and String
> addition for creating the record is really, really a bad thing to do).
> 
> I think for your testing, you should probably disable the logger (for
> the same reasons you should not use a Java client), otherwise you
> might be measuring the logger speed...
> 
>  > When put upon heavy stress (with the right stresser), Jigsaw refuses
>  > many
>  > connections (the ServerSocket.accept call generates a "Too many file
>  > handles open")
>  > runtime Exception under Solaris. There are limits on the number of file
>  > handles
>  > in the kernel and per process. Is there any way to parametrize it in
>  > Jigsaw
>  > so that we can tune it for heavy load?
> 
> Not in Jigsaw, you can however use the ulimit program under UNIX. If
> you're root you should raise it "as much as you can", say at least to
> twice the number of simultaneous connections you want to handle (see
> below the SocketClientFactory tuning too)
> 
>  > I have seen that the ServerSocket instantiated in SocketClientFactory
>  > has a hardcoded
>  > backlog equal to 128. Could you comment on that? Why is it so high?
> 
> It should really be customizable, until it becomes so, I have used an
> admiteldy high value. Don't forget though, that if you want to handle
> say 100 simultaneous connections, this number is probably still
> low. Ultimately the default value should be something (I guess) like
> (x*numberOfSimConnectionsToHandle)
> where x depends on the CPU of your machine, and on the iregularities
> of the # of connection rate (say if you have high peaks of connections
> at given times)
> 
>  > Could you supply me on guidelines on how to set
>  > the w3c.jigsaw.http.socket.SocketClientFactory properties
>  > minfree, maxfree, maxidle, masclients, idleTimeout and maxThreads.
>  > I mean they are complex interdependencies between those with respects to
>  > the tuning
>  > of the performance of Jigsaw.
> 
> [skip to the 'example' if this is really unreadable]
> Note that this is a really interesting point of Jigsaw, the following
> algorithm comes from my mind, with the hope that it will work well in
> practice, but if you have better ideas, let me know. Anyway, here is
> how it goes (you probably want to have the code handy, l-xx means line
> number xx of SocketClientFactory):
> 
> The SocketClientFactory has four running states, corresponding to
> different "load" modes (l-121):
> - AVG_LIGHT: the server can acquire more sockets and can use more
>              CPU. In this mode, the server will keep connections open
>              for ever, and will accept all new connections.
> - AVG_NORMAL: the server should not consume more sockets (ie it has
>              already opened near-the-max number of permitted
>              sockets). In this mode, the server will try to close
>              least-recently used connections before accepting a new
>              connection (that it will still always accept).
> - AVG_HIGH: this is really the same as the above mode, except that
>              "trying" to kill least-recently used connections doesn't
>              seem to suffice to cope with the load. In this mode the
>              accepting thread priority is made lower than the client
>              thread priority (with the hope that request handling gets
>              more CPU than accepting new connections)
> - AVG_DEAD: we have reached all our resources limit (might be either
>              sockets or CPU), in this mode, the server will start
>              rejecting connections, and emit appropriate error
>              messages in the errlog.
> 
> The SocketClientFactory maintains the following variables (here client
> really means an instance of SocketClient, not necessarily a thread yet):
> 
> - idleCount: the number of clients whose connections is maintained
>              persistent, and which are currently waiting for a request.
> - freeCount: the number of clients currently unused (ie ready to
>              run new connections)
> - clientCount: the total number of clients
> - maxClients: the maximum number of simultaneous clients.
> 
> Switching between the four load modes (l-257) is based on the value of
> the above variables and the following "water marks" (they are prefixed
> with w3c.jigsaw.http.socket.SocketClientFactory as properties):
> 
> When freeCount is lower than maxFree, the server asssumes LIGHT load
> (init state). As connections are accepted, freeCount decreases. At
> some point it becomes lower than maxFree. maxFree is the "hight water
> mark" for the freeCount counter.
> 
> Now, if freeCount is still greater than minFree (minFree is the "low
> water mark" for freeCount) *and* if the number of idle connections has
> not yet reached its maximum the load mode is turned to NORMAL (maxIdle
> is the water-mark for idleCount)
> 
> If one of the above condition is not true, but we still have clients
> ready to accept new connections, load mode is set HIGH, otherwise (no
> more free clients), load mode is setto DEAD.
> 
> example:
> 
> Before I drop all readers, let's take the default config as an example.
> 
> The default config assumes 64 max simultaneous connections (which is
> the default on solaris). This gives us the value for maxClients:
> 
> maxClients=64/2; (one connection requires at least two file handles)
> 
> The second step is to decide on a value for maxFree, which controls
> the point at which persistent connections are going to be killed when
> accepting new connections. I think this depends on the power of the
> host, I would recommend that the server should stay in LIGHT mode
> until 70% of clients are used
> 
> maxFree=0.3*32=10
> 
> The third step is to estimate the point at which you will want to
> start 'killing' persistent connections:
> 
> The first parameter is maxIdle.I would recommend using a 30% margin
> (to account for the time between kill and terminate, as explained
> above):
> 
> maxIdle=0.7*maxClients=22
> [the default is pessimisticly set to 20]
> 
> The second parameter is minFree. The server will remain in LIGHT mode
> until that number of clients is free again. (this is to avoid
> switching back and forth LIGHT and NORMAL mode: think of maxFree as
> the LIGHT mode exit value and of minFree as the LIGHT mode enter
> value):
> 
> minFree=0.7*maxFree=7
> [again, default setting is pessimistic]
> 
> Now, reamins one setting concerning the way clients are mapped to
> threads. Since 1.0alpha5 this uses a thread cache which has two config
> parameters:
> 
> - maxThreads: max number of threads
> - idleTimeout: time to keep exceeding threads a live
> 
> The pool of threads initally creates maxThreads/2 threads. If more
> threads have to be created, they will stay alive after usage, only for
> idleTimeout ms.
> maxThreads, as you can see only controls the initial size of the
> thread pool (it's get 40 in the default config, so that we start with
> 20 threads). Threads will always be created when needed (l-620).
> 
> I think idleTimeout should be fearly large (at least a few seconds),
> because peaks tend to last for "long"...
> 
>  > Currently, we are running benchmarks with different values for all those
>  > parameters so that we could possible derive tuning rules.
>  > An explaination of the rationale behind the algorithms coded in Jigsaw
>  > would help a lot. I am referring to the changing priority of the thread
>  > accepting
>  > sockets, the killing of clients and the server state changes.
> 
> I hope the above (long) explanation helps. If you find a better way to
> set the parameters, or any enhancements to the way this piece is done,
> let me know.
> 
>  > To finish, could you clarify the following phenomenon.
>  > Upon certain circumstances the stresser in httpd does not return when
>  > Jigsaw gets
>  > overloaded. I mean certain threads never join. We observed the same for
>  > Stresser (in java). This is strange considering than Jigsaw seems to
>  > recover in the meantime
>  > from the overload.
> 
> I have observed that too, I'll check the problem again.
> 
> Anselm.

-- 

Sincerely yours,

-------------------------------------------------------
Francois Deza          mailto:francois.deza@sema.fr
Corporate R&D          http://www.semagroup.com  

Sema Group                tel:   33 - 1 - 40 92 43 16
16, rue Barbes            fax:   33 - 1 - 40 92 42 41
92126 Montrouge Cedex   
France                    

secretary: Beth Gould
tel:       33 - 1 - 40 92 42 18
email:     beth.gould@sema.fr
-------------------------------------------------------
Received on Thursday, 27 February 1997 17:02:34 UTC