- From: Sven Laaks <slaaks@informatik.uni-essen.de>
- Date: Wed, 13 Jun 2001 22:56:43 +0200
- To: www-lib@w3.org
Hi Michel,
I work on a program doing the same thing.
How many different web pages are you polling?
Is the firwall's name is Rambo?
I rather thought of a bug or a bad use/bad modification.
I tried 7-10 different web pages with different amount of inlined
images and pages(frames).
I do not know if it is a firewall or another serverside security
system. I check the page 3 times (intervall=10 minutes) and the robot
terminates every with a core dump after 12 hours. So I check the page
for 6 hours (intervall=2 minutes) with very short intervall and the
robot terminates correctly. I do not think that the amount of used
memory will crash the robot. What do you think?
When periodically polling and http-parsing a web page
the main memory leak I got was because of Anchors.
Even if you call HTRequest_delete() the Anchors are not deleted.
Anchors are globally shared between all the requests.
Then keeping Anchors is the right choice that allows to have
multiple running requests.
You can clear the Anchor by calling the HTAnchor_deleteAll
at any point where there is none running request.
I had imagened that the Anchors will increase the memory usage. I
tried HTAnchor_deleteAll() (see below) in the terminate_handler -
function. But this do not work...
What I do not understand is why Anchors are cteated after the first
intervall even if the first URL, and the inlined objects, are the
same?
For my program it would be better to remove "everything" after an
intervall but I do not know how...
...
/* Should we stop? */
if (mr->cnt <= 0) {
HTList * abc;
HTAnchor_deleteAll(abc);
HTList_delete(abc);
if (mr->hyperdoc) {
HTList * cur = mr->hyperdoc;
Hyperdoc * pres;
while ((pres = (Hyperdoc *) HTList_nextObject(cur))) {
if ((HTList_removeObject(mr->hyperdoc, pres)) == NO)
HTTRACE(APP_TRACE, "NOT FOUND\n");
Hyperdoc_delete(pres);
cur = mr->hyperdoc;
}
}
if (mr->htext) {
HTList * cur = mr->htext;
HText * pres;
while ((pres = (HText *) HTList_nextObject(cur))) {
if ((HTList_removeObject(mr->htext, pres)) == NO)
HTTRACE(APP_TRACE, "NOT FOUND\n");
WHText_delete(pres);
cur = mr->htext;
}
}
mr->cindex = 0;
...
"Using the same HyperDoc"...
How are you doing this?
What functions do you call?
I believe your HyperDoc tree must be bigger at each interval
because links between anchors will be duplicated.
Hyperdoc objects are created in the RHText_foundAnchor and the
RHText_foundImage functions. There is also the following (modified)
code:
// RHText_foundAnchor
...
if (!hd && dest_parent) {
--> hd = Hyperdoc_new(mr, dest_parent, depth);
mr->cdepth[depth]++;
}
/* Test whether we already have a hyperdoc for this document */
if (mr->flags & MR_LINK && dest_parent) {
Finger * newfinger = Finger_new(mr, dest_parent, METHOD_GET);
HTRequest * newreq = newfinger->request;
...
// RHText_foundImage
...
if (dest) {
Finger * newfinger = Finger_new(mr, dest_parent,
mr->flags & MR_SAVE ?
METHOD_GET : METHOD_HEAD);
HTRequest * newreq = newfinger->request;
if (!hd && dest_parent) {
--> Hyperdoc_new(mr, dest_parent, 1);
}
HTRequest_setParent(newreq, referer);
newfinger->from = finger->dest;
if (HTLoadAnchor((HTAnchor *) dest, newreq) != YES) {
Finger_delete(newfinger);
}
...
I have tested it and it works. What do you think?
Best regards, Sven
Received on Wednesday, 13 June 2001 17:00:46 UTC