- From: Q. Alex Zhao <aZhao@cc.gatech.edu>
- Date: Thu, 22 Feb 2001 16:09:32 -0500
- To: "John Punin" <puninj@cs.rpi.edu>
- Cc: <www-lib-bugs@w3.org>, <www-lib@w3.org>
Using "-single -bfs", the program exited immediately. Using just "-single", it hangs after running a day or so -- doesn't exit, but doesn't generate any output, either. The reason I didn't use -prefix or -include options was that I had a configuration file that limited the crawling to the "cc.gatech.edu" domain, and specified more complicated pruning of the web trees. Should I try compiling webbot on a different platform to get better reliability? Thanks. = alex ] -----Original Message----- ] Hello ] When this happens to me, I don't use -nopipe and I use -single -bfs ] Hope this helps ] John Punin ] PS. you should also use as a prefix http://www.cc.gatech.edu/ or -include ] "cc.gatech.edu" ] ] ] On Wed, 21 Feb 2001, Q. Alex Zhao wrote: ] ] > Got the code from CVS yesterday (Feb 20) and compiled it on ] Solaris 2.5.1 ] > with configure options "--disable-shared --with-regex". Webbot still ] > crashes from "Broken Pipe" signals. What config option should I ] use to make ] > it ignore that signal? ] > ] > Stack dump: ] > ] > (gdb) info program ] > Using the running image of child LWP 1 via /proc. ] > Program stopped at 0xef63905c. ] > It stopped with signal SIGPIPE, Broken pipe. ] > (gdb) where ] > #0 0xef63905c in _libc_sigprocmask () ] > #1 0xef6d9d28 in _connect2 () ] > #2 0xef6d9c20 in __connect () ] > #3 0xef6d9a48 in _connect () ] > #4 0x87e48 in HTDoConnect (net=0xaca9b0) at HTTCP.c:320 ] > #5 0x77778 in HTHost_connect (host=0x2e4b50, net=0xaca9b0, ] > url=0xaca980 "http://triton.cc.gatech.edu/ubicomp/756") at ] > HTHost.c:1316 ] > #6 0x3ebe8 in HTTPEvent (soc=13, pVoid=0xac8bd8, ] > type=HTEvent_WRITE) ] > at HTTP.c:1066 ] > #7 0x744a0 in HostEvent (soc=13, pVoid=0x2e4b50, ] > type=HTEvent_WRITE) ] > at HTHost.c:240 ] > #8 0x20908 in EventOrder_executeAndDelete () at ] > HTEvtLst.c:326 ] > #9 0x21cc4 in HTEventList_loop (theRequest=0xf0360) at ] > HTEvtLst.c:791 ] > #10 0x18ff4 in main (argc=25, argv=0xeffff4ec) at ] > RobotMain.c:594 ] > ] > The command line arguments are: ] > ] > -q -n -ss -nopipe -cache -cache_size 48 -cacheroot ] /usr/tmp/w3c-cache -r $HOME/raw/webbot.conf -prefix http -depth ] 256 -exclude ] '\.gz$|\.tar$|\.tgz$|\.bz2$|\.Z$|\.zip$|\.ZIP$|\.exe$|\.EXE$|\.ps$ ] |\.PS$|\.doc$|\.DOC$|\.pdf$|\.PDF$|\.xplot$|\.tiff$|\.tif$|\.TIF$| ] \.java$|\.JAVA$|\.c$|\.h$|\.txt$|\.ppt$|\.PPT$|\.qt$|\.mov$|\.bin$ ] |\.sh$|\.avi$|\.AVI$|\.mpg$|\.MPG$|\.mpeg$|\.MPEG$|\.au$|\.wav$|\. ] WAV$' -img -check ] '\.gif$|\.GIF$|\.png$|\.PNG$|\.jpeg$|\.JPEG$|\.jpg$|\.JPG$' ] -redir -referer /usr/tmp/ImageMapping.raw http://www.cc.gatech.edu/ ] > ] > I would really like to make this work, but I don't know ] anything about the ] > internals of libwww.
Received on Thursday, 22 February 2001 16:10:05 UTC