- From: Sven Laaks <slaaks@informatik.uni-essen.de>
- Date: Wed, 13 Jun 2001 22:56:43 +0200
- To: www-lib@w3.org
Hi Michel, I work on a program doing the same thing. How many different web pages are you polling? Is the firwall's name is Rambo? I rather thought of a bug or a bad use/bad modification. I tried 7-10 different web pages with different amount of inlined images and pages(frames). I do not know if it is a firewall or another serverside security system. I check the page 3 times (intervall=10 minutes) and the robot terminates every with a core dump after 12 hours. So I check the page for 6 hours (intervall=2 minutes) with very short intervall and the robot terminates correctly. I do not think that the amount of used memory will crash the robot. What do you think? When periodically polling and http-parsing a web page the main memory leak I got was because of Anchors. Even if you call HTRequest_delete() the Anchors are not deleted. Anchors are globally shared between all the requests. Then keeping Anchors is the right choice that allows to have multiple running requests. You can clear the Anchor by calling the HTAnchor_deleteAll at any point where there is none running request. I had imagened that the Anchors will increase the memory usage. I tried HTAnchor_deleteAll() (see below) in the terminate_handler - function. But this do not work... What I do not understand is why Anchors are cteated after the first intervall even if the first URL, and the inlined objects, are the same? For my program it would be better to remove "everything" after an intervall but I do not know how... ... /* Should we stop? */ if (mr->cnt <= 0) { HTList * abc; HTAnchor_deleteAll(abc); HTList_delete(abc); if (mr->hyperdoc) { HTList * cur = mr->hyperdoc; Hyperdoc * pres; while ((pres = (Hyperdoc *) HTList_nextObject(cur))) { if ((HTList_removeObject(mr->hyperdoc, pres)) == NO) HTTRACE(APP_TRACE, "NOT FOUND\n"); Hyperdoc_delete(pres); cur = mr->hyperdoc; } } if (mr->htext) { HTList * cur = mr->htext; HText * pres; while ((pres = (HText *) HTList_nextObject(cur))) { if ((HTList_removeObject(mr->htext, pres)) == NO) HTTRACE(APP_TRACE, "NOT FOUND\n"); WHText_delete(pres); cur = mr->htext; } } mr->cindex = 0; ... "Using the same HyperDoc"... How are you doing this? What functions do you call? I believe your HyperDoc tree must be bigger at each interval because links between anchors will be duplicated. Hyperdoc objects are created in the RHText_foundAnchor and the RHText_foundImage functions. There is also the following (modified) code: // RHText_foundAnchor ... if (!hd && dest_parent) { --> hd = Hyperdoc_new(mr, dest_parent, depth); mr->cdepth[depth]++; } /* Test whether we already have a hyperdoc for this document */ if (mr->flags & MR_LINK && dest_parent) { Finger * newfinger = Finger_new(mr, dest_parent, METHOD_GET); HTRequest * newreq = newfinger->request; ... // RHText_foundImage ... if (dest) { Finger * newfinger = Finger_new(mr, dest_parent, mr->flags & MR_SAVE ? METHOD_GET : METHOD_HEAD); HTRequest * newreq = newfinger->request; if (!hd && dest_parent) { --> Hyperdoc_new(mr, dest_parent, 1); } HTRequest_setParent(newreq, referer); newfinger->from = finger->dest; if (HTLoadAnchor((HTAnchor *) dest, newreq) != YES) { Finger_delete(newfinger); } ... I have tested it and it works. What do you think? Best regards, Sven
Received on Wednesday, 13 June 2001 17:00:46 UTC