- From: Martin Sjolin <marsj@ida.liu.se>
- Date: Sat, 13 May 95 20:59:06 +0200
- To: www-talk@www10.w3.org
I wrote:
 > 1. take a base URL
 > 2. retrieves all URL in the base document, but
 > 3. do not goes outside the server (e.g. restrict the set of
 >    allowed URL),
 > 4. minimum time between HEADs/GETs,
 > 5. runs under unix (preferable SunOS 4.1 - i have ported software
 >    to hp-ux/solaris 2.x/dec osf/4.3bsd/aix/ultrix/sgi/linux)
I better clarify (4) - i would like to retreive all URL from
a site, but according to (4), have minimum time between two
GETs as to avoid overloading the server. 
Answers to the query:
A). http://www.inria.fr/koala/abaird/oscheme/oscheme.html with
    the  "www-list" scripts (from Anselm.Baird_Smith@inria.fr)
B). http://www.ics.uci.edu/WebSoft/MOMspider/ (MOMspider)
    (from joshuap@sdsc.edu (Joshua Polterock))
C). http://iamwww.unige.ch/~scg/Src/Scripts/  with the
    explore script (diana@seldon.terminus.com (Cookie Monster))
D). Simon Spero <ses@tipper.oit.unc.edu> have a set of programs 
    for benmarking.
E). rst@ai.mit.edu (Robert S. Thau) has written a logfile replay program,
    runs SunOS, which reports the main latency for every 100 transactions,
    and which handle multiple outstanding requests. Found at
    ftp://ftp.ai.mit.edu/pub/users/rst/monkey.c
F). www2dot from einpost@win.tue.nl (Reinier Post), it might no
    fill the (4) requirement. Contact Reiner Post. Based on libwww2.
BTW, I probably try to use (C). For those interested, i'm running
a gateway (CGI based), which generates HTML pages on the fly. 
I'm interested the above to profile the gateway (written in C).
thanks to all who answered,
msj
--
Martin Sj\"olin | http://www.ida.liu.se/labs/iislab/people/marsj
Department of Computer Science, LiTH, S-581 83 Link\"oping, SWEDEN 
phone : +46 13 28 24 10 | fax : +46 13 28 26 66 | e-mail: marsj@ida.liu.se 
Received on Saturday, 13 May 1995 15:19:21 UTC