- From: Michel Philip <mphilip@infovista.com>
- Date: Thu, 14 Jun 2001 10:25:11 -0400 (EDT)
- To: "'www-lib@w3.org'" <www-lib@w3.org>
Hi Sven, > I tried 7-10 different web pages with different amount of > inlined images and pages(frames). That means 7-10 main requests and for each main request a good amount of what I call 'sub-requests'. As for my program I must remember that: 1) different requests may target the same host. (eg HTLoadAbsolute("http://www.w3.org/") - HTLoadAbsolute("http://www.w3.org/Library") ) 2) sub-requests may target different hosts. (eg parsing "free.fr" ==> sub-requests to load images from "/ad.fr.doubleclick.net/.../xxx.gif" and from "img2.free.fr/thebourse22.gif" > I do not know if it is a firewall or another serverside > security system. I believe security system isn't a matter. > I check the page 3 times (intervall=10 minutes) > and the robot terminates every with a core dump after 12 hours. > So I check the page for 6 hours (intervall=2 minutes) with very > short intervall and the robot terminates correctly. > I do not think that the amount of used memory will crash the robot. > What do you think? Indeed 12 * 6 = 72 < 6 * 30 = 180. So probably not "out of memory". There are timing issues in the w3 lib. > I had imagened that the Anchors will increase the memory usage. > I tried HTAnchor_deleteAll() (see below) in the terminate_handler > function. But this do not work... Maybe you'd call HTAnchor_deleteAll() at the end of the code rather than at the beginning. How do you know, in terminate_handler, that all requests are done? Is the (mr->cnt <= 0) means this? I guess it rather means that just the sub-requests of one given main request are done. > What I do not understand is why Anchors are cteated after > the first intervall even if the first URL, and the inlined > objects, are the same? The Anchors are not created again when you load again the same web page (else if the page has changed and contains new links). All the anchors are stored in the global adult table (see HTAnchor.c) What is growing is the links between Anchors. During a page parsing, one can get different links to the same anchor. > For my program it would be better to remove "everything" after an > intervall but I do not know how... See the mail(s) of Thorsten Rinkenberger. Particularly the last one: http://lists.w3.org/Archives/Public/www-lib/2001AprJun/0119.html He use a global counter of all the requests. The place where he call HTProfile_delete(); HTLibTerminate(); is the place where one can call HTAnchor_deleteAll(NULL); if one wants to start a new polling interval. > [...] > I have tested it and it works. What do you think? As for me in the program I work, the user can start/stop the polling when he wants, on any web page, and he can changes the polling period for each web page. Then the polling can overlap and there is never such a point in this program. So I wrote a specific function to remove the links of an Anchor to avoid constant memory growing. MP.
Received on Thursday, 14 June 2001 11:20:03 UTC