[Prev][Next][Index][Thread]

Question about robots



Hello, 
 
Can I build, in an easy way, a robot, using the libwww-4.1b1.

I would like the robot to:
  -     Take all the HTML pages from a site in raw HTML, and put them
        in a file at my local file system.
  -	Only requset HTTP links
  -     Never request a page twice.
  -     Respect the robots.txt rules.
  -     Delay at least 1 second between the requests.
  -	Use 5Mb primary memory at most (This is the reason why I don't
	use perl).
 
I have tested the example robot that is in the w3c-libwww.tar.gz
distribution. I do not entirely understand this program and do not know
where to start if I should modify it.

Any clues much appreciated

Thanks for your help
/Henrik

-- 
Henrik Fagrell                    Telephone: +46-31-773 27 41
Göteborg University               Fax:       +46-31-773 47 54
Department of Informatics         Email: fagrell@adb.gu.se
Box 3147, 400 10 Göteborg         WWW:   http://www.adb.gu.se/~fagrell
Sweden