- From: Frank Wood <fwood@tofish.net>
- Date: Fri, 3 Sep 1999 18:39:06 -0400
- To: "'www-lib@w3c.org'" <www-lib@w3c.org>
Hello All, I'm evaluating libwww for use in a crawler we're developing so I thought I'd check out what's already been written (webbot). I'm not too happy with what I've found thusfar and am seeking some help. I'm trying to run the latest CVS version of webbot.exe compiled under MSVC 5.0 on NT 4.0 sp4 but I have been unsuccessful in getting it to run for any significant amount of time at all. (<5 minutes) My machine never comes close to hitting out of memory conditions and it happens on every site I've tried. command line: webbot http://"bigasssite"/ -depth 10 -norobots -prefix http -bfs I understand that this is resource intensive and could piss off the owner of said "bigasssite" if they closely examined their logs; however, all I want to see is that the crawler will run until my machine runs out of resource (memory in particular). To my chagrin, I have not been able to run it without it crashing at: HTTChunk.c:55, %s=line doesn't print out particularly well on the dos shell, but it's, in one case "gn="middle">" So, what's the scoop? My bet is that an edge byte or two is misinterpreted during de-chunking and the decoder gets fooled into thinking its looking at a header instead of document body. Either that or the server barfed up something formatted badly and the de-chunking process failed un-gracefully. Fixing this little buggy would make me very, very happy. Let me know if I can help. Also, in the process of trying to get around this I discovered that revision 2.5 of HTCookie.c doesn't work with the current Win32 Makefiles. I didn't fix it, just rolled that rev. back to 2.4 and everything with the exception of the forementioned problems, is (compiles) fine. Thanks, Frank Wood (mailto:fwood@tofish.net) ToFish! Incorporated (http://www.tofish.net/) 2121 K St.NW Suite 800 Washington, DC 20037 PH (202) 261-3591 FAX (202) 261-3592
Received on Friday, 3 September 1999 18:30:26 UTC