- From: Martin Sjolin <marsj@ida.liu.se>
- Date: Sat, 13 May 95 20:59:06 +0200
- To: www-talk@www10.w3.org
I wrote: > 1. take a base URL > 2. retrieves all URL in the base document, but > 3. do not goes outside the server (e.g. restrict the set of > allowed URL), > 4. minimum time between HEADs/GETs, > 5. runs under unix (preferable SunOS 4.1 - i have ported software > to hp-ux/solaris 2.x/dec osf/4.3bsd/aix/ultrix/sgi/linux) I better clarify (4) - i would like to retreive all URL from a site, but according to (4), have minimum time between two GETs as to avoid overloading the server. Answers to the query: A). http://www.inria.fr/koala/abaird/oscheme/oscheme.html with the "www-list" scripts (from Anselm.Baird_Smith@inria.fr) B). http://www.ics.uci.edu/WebSoft/MOMspider/ (MOMspider) (from joshuap@sdsc.edu (Joshua Polterock)) C). http://iamwww.unige.ch/~scg/Src/Scripts/ with the explore script (diana@seldon.terminus.com (Cookie Monster)) D). Simon Spero <ses@tipper.oit.unc.edu> have a set of programs for benmarking. E). rst@ai.mit.edu (Robert S. Thau) has written a logfile replay program, runs SunOS, which reports the main latency for every 100 transactions, and which handle multiple outstanding requests. Found at ftp://ftp.ai.mit.edu/pub/users/rst/monkey.c F). www2dot from einpost@win.tue.nl (Reinier Post), it might no fill the (4) requirement. Contact Reiner Post. Based on libwww2. BTW, I probably try to use (C). For those interested, i'm running a gateway (CGI based), which generates HTML pages on the fly. I'm interested the above to profile the gateway (written in C). thanks to all who answered, msj -- Martin Sj\"olin | http://www.ida.liu.se/labs/iislab/people/marsj Department of Computer Science, LiTH, S-581 83 Link\"oping, SWEDEN phone : +46 13 28 24 10 | fax : +46 13 28 26 66 | e-mail: marsj@ida.liu.se
Received on Saturday, 13 May 1995 15:19:21 UTC