Question about robots
Can I build, in an easy way, a robot, using the libwww-4.1b1.
I would like the robot to:
- Take all the HTML pages from a site in raw HTML, and put them
in a file at my local file system.
- Only requset HTTP links
- Never request a page twice.
- Respect the robots.txt rules.
- Delay at least 1 second between the requests.
- Use 5Mb primary memory at most (This is the reason why I don't
I have tested the example robot that is in the w3c-libwww.tar.gz
distribution. I do not entirely understand this program and do not know
where to start if I should modify it.
Any clues much appreciated
Thanks for your help
Henrik Fagrell Telephone: +46-31-773 27 41
Göteborg University Fax: +46-31-773 47 54
Department of Informatics Email: firstname.lastname@example.org
Box 3147, 400 10 Göteborg WWW: http://www.adb.gu.se/~fagrell