W3C home > Mailing lists > Public > www-lib@w3.org > April to June 2001

Re: Segm.Fault and memory problems

From: Sven Laaks <slaaks@informatik.uni-essen.de>
Date: Wed, 13 Jun 2001 22:56:43 +0200
Message-Id: <5.0.2.1.2.20010613194507.00a68288@pop3.norton.antivirus>
To: www-lib@w3.org
Hi Michel,

I work on a program doing the same thing.
How many different web pages are you polling?
Is the firwall's name is Rambo?
I rather thought of a bug or a bad use/bad modification.

I tried 7-10 different web pages with different amount of inlined 
images and pages(frames).
I do not know if it is a firewall or another serverside security 
system. I check the page 3 times (intervall=10 minutes) and the robot 
terminates every with a core dump after 12 hours. So I check the page 
for 6 hours (intervall=2 minutes) with very short intervall and the 
robot terminates correctly. I do not think that the amount of used 
memory will crash the robot. What do you think?

When periodically polling and http-parsing a web page
the main memory leak I got was because of Anchors.
Even if you call HTRequest_delete() the  Anchors are not deleted.
Anchors are globally shared between all the requests.
Then keeping Anchors is the right choice that allows to have
multiple running requests.
You can clear the Anchor by calling the HTAnchor_deleteAll
at any point where there is none running request.

I had imagened that the Anchors will increase the memory usage. I 
tried HTAnchor_deleteAll() (see below) in the terminate_handler - 
function. But this do not work...
What I do not understand is why Anchors are cteated after the first 
intervall even if the first URL, and the inlined objects, are the 
same?
For my program it would be better to remove "everything" after an 
intervall but I do not know how...

...
/* Should we stop? */
if (mr->cnt <= 0) {
	HTList * abc;
	HTAnchor_deleteAll(abc);
	HTList_delete(abc);
	if (mr->hyperdoc) {
		HTList * cur = mr->hyperdoc;
		Hyperdoc * pres;
		while ((pres = (Hyperdoc *) HTList_nextObject(cur))) {
			if ((HTList_removeObject(mr->hyperdoc, pres)) == NO)
				HTTRACE(APP_TRACE, "NOT FOUND\n");
			Hyperdoc_delete(pres);
			cur = mr->hyperdoc;
		}
	}
	if (mr->htext) {
		HTList * cur = mr->htext;
		HText * pres;
		while ((pres = (HText *) HTList_nextObject(cur))) {
			if ((HTList_removeObject(mr->htext, pres)) == NO)
				HTTRACE(APP_TRACE, "NOT FOUND\n");
			WHText_delete(pres);
			cur = mr->htext;
		}
	}
	mr->cindex = 0;
...

"Using the same HyperDoc"...
How are you doing this?
What functions do you call?
I believe your HyperDoc tree must be bigger at each interval
because links between anchors will be duplicated.

Hyperdoc objects are created in the RHText_foundAnchor and the 
RHText_foundImage functions. There is also the following (modified) 
code:

// RHText_foundAnchor
...
if (!hd && dest_parent) {
-->	hd = Hyperdoc_new(mr, dest_parent, depth);
	mr->cdepth[depth]++;
}

/* Test whether we already have a hyperdoc for this document */
if (mr->flags & MR_LINK && dest_parent) {
	Finger * newfinger = Finger_new(mr, dest_parent, METHOD_GET);
	HTRequest * newreq = newfinger->request;
...

// RHText_foundImage
...
if (dest) {
	Finger * newfinger = Finger_new(mr, dest_parent,
		mr->flags & MR_SAVE ?
		METHOD_GET : METHOD_HEAD);
	HTRequest * newreq = newfinger->request;
	if (!hd && dest_parent) {
-->		Hyperdoc_new(mr, dest_parent, 1);
	}
	HTRequest_setParent(newreq, referer);
	newfinger->from = finger->dest;
	if (HTLoadAnchor((HTAnchor *) dest, newreq) != YES) {
		Finger_delete(newfinger);
	}
...

I have tested it and it works. What do you think?

Best regards, Sven
Received on Wednesday, 13 June 2001 17:00:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:39 GMT