- From: Henrik Fagrell <fagrell@adb.gu.se>
- Date: Fri, 14 Jun 1996 11:15:22 +0200
- To: www-lib@w3.org
Hello, Can I build, in an easy way, a robot, using the libwww-4.1b1. I would like the robot to: - Take all the HTML pages from a site in raw HTML, and put them in a file at my local file system. - Only requset HTTP links - Never request a page twice. - Respect the robots.txt rules. - Delay at least 1 second between the requests. - Use 5Mb primary memory at most (This is the reason why I don't use perl). I have tested the example robot that is in the w3c-libwww.tar.gz distribution. I do not entirely understand this program and do not know where to start if I should modify it. Any clues much appreciated Thanks for your help /Henrik -- Henrik Fagrell Telephone: +46-31-773 27 41 Göteborg University Fax: +46-31-773 47 54 Department of Informatics Email: fagrell@adb.gu.se Box 3147, 400 10 Göteborg WWW: http://www.adb.gu.se/~fagrell Sweden
Received on Friday, 14 June 1996 05:14:48 UTC