RE: Crash in EventListTimerHandler during load test

Hi,

I guess the problem is in HTHost_clearChannel(). There is a "timer leak".
Try the following changes in HTHost.c and try again:

// ----------------------------------------------------------------
// Jens: changed in in HTHost.c HTHost_clearChannel()
// remark: delets a leaking timer message for each request
// 
// ----------------------------------------------------------------

PUBLIC BOOL HTHost_clearChannel (HTHost * host, int status)

PUBLIC BOOL HTHost_clearChannel (HTHost * host, int status)
{
    if (host && host->channel) {
	HTChannel_setHost(host->channel, NULL);
	
	HTEvent_unregister(HTChannel_socket(host->channel), HTEvent_READ);
	HTEvent_unregister(HTChannel_socket(host->channel), HTEvent_WRITE);
#ifdef WWW_WIN_ASYNC
	HTEvent_unregister(HTChannel_socket(host->channel), HTEvent_CLOSE);
#endif /* WWW_WIN_ASYNC */
	host->registeredFor = 0;


I added the HTEvent_unregister(HTChannel_socket(host->channel),
HTEvent_CLOSE); call because in Windows, HTHost_register () registers as
well for a CLOSE event.


Jens










-----Original Message-----
From: Attila Uhljar [mailto:attila.uhljar@intervoice-brite.com]
Sent: Dienstag, 6. März 2001 12:44
To: www-lib@w3.org
Subject: Crash in EventListTimerHandler during load test


Hi All,

I'm having problems with LibWWW during load tests. After about 10-20
hours (or 1 - 2,000,000 transactions) of continous load (doing simple
GETs) it crashes in the following (line marked w/ '>>>>' line in the
EventListTimerHandler() function (HTEvtLst.c @ 206):

 ...
    SockEvents * sockp = (SockEvents *) param;
    HTEvent * event = NULL;

    /* Check for read timeout */
      if (sockp->timeouts[HTEvent_INDEX(HTEvent_READ)] == timer) {
      event = sockp->events[HTEvent_INDEX(HTEvent_READ)];
      HTTRACE(THD_TRACE, "Event....... READ timed out on %d.\n" _
sockp->s);
>>>>  return (*event->cbf) (sockp->s, event->param, HTEvent_TIMEOUT);
    }
 ...

What happens is that the event structure's 'cbf' field had become
invalid (zero or some random value) and an illegal address is being used
for the callback. I think the reason is that the event object in
question is already freed by somebody, and the memory is re-used for
something else (that would explain why it takes so long for this problem
to arise - the memory might keep the original value even if its freed,
if nobody re-uses it).

So far I was unable to figure out why this is happening. If somebody
knows or has any idea why, please let me know, it will be greatly
appreciated!

By the way, I'm using the 5.3.2 package on Windows NT, using async mode.
I've tried several patches suggested on the mailing list (including the
HTTimer patch from Stefan Wiesner), but to no avail.

Thanks,
Attila

Received on Tuesday, 6 March 2001 16:58:29 UTC