W3C home > Mailing lists > Public > www-lib@w3.org > January to March 2001

RE: Webbot is still crashing at "Broken Pipe"

From: Q. Alex Zhao <aZhao@cc.gatech.edu>
Date: Thu, 22 Feb 2001 16:09:32 -0500
To: "John Punin" <puninj@cs.rpi.edu>
Cc: <www-lib-bugs@w3.org>, <www-lib@w3.org>
Message-ID: <000001c09d13$c0aee240$97814dc7@mobile.cc.gt.atl.ga.us>
Using "-single -bfs", the program exited immediately. Using just "-single",
it hangs after running a day or so -- doesn't exit, but doesn't generate any
output, either.

The reason I didn't use -prefix or -include options was that I had a
configuration file that limited the crawling to the "cc.gatech.edu" domain,
and specified more complicated pruning of the web trees.

Should I try compiling webbot on a different platform to get better
reliability?

Thanks.
= alex

] -----Original Message-----
] Hello
] When this happens to me, I don't use -nopipe and I use -single -bfs
] Hope this helps
] John Punin
] PS. you should also use as a prefix http://www.cc.gatech.edu/ or -include
] "cc.gatech.edu"
]
]
] On Wed, 21 Feb 2001, Q. Alex Zhao wrote:
]
] > Got the code from CVS yesterday (Feb 20) and compiled it on
] Solaris 2.5.1
] > with configure options "--disable-shared --with-regex". Webbot still
] > crashes from "Broken Pipe" signals. What config option should I
] use to make
] > it ignore that signal?
] >
] > Stack dump:
] >
] >     (gdb) info program
] >     Using the running image of child LWP    1         via /proc.
] >     Program stopped at 0xef63905c.
] >     It stopped with signal SIGPIPE, Broken pipe.
] >     (gdb) where
] >     #0  0xef63905c in _libc_sigprocmask ()
] >     #1  0xef6d9d28 in _connect2 ()
] >     #2  0xef6d9c20 in __connect ()
] >     #3  0xef6d9a48 in _connect ()
] >     #4  0x87e48 in HTDoConnect (net=0xaca9b0) at HTTCP.c:320
] >     #5  0x77778 in HTHost_connect (host=0x2e4b50, net=0xaca9b0,
] > 	url=0xaca980 "http://triton.cc.gatech.edu/ubicomp/756") at
] > 	HTHost.c:1316
] >     #6  0x3ebe8 in HTTPEvent (soc=13, pVoid=0xac8bd8,
] > 	type=HTEvent_WRITE)
] > 	at HTTP.c:1066
] >     #7  0x744a0 in HostEvent (soc=13, pVoid=0x2e4b50,
] > 	type=HTEvent_WRITE)
] > 	at HTHost.c:240
] >     #8  0x20908 in EventOrder_executeAndDelete () at
] > 	HTEvtLst.c:326
] >     #9  0x21cc4 in HTEventList_loop (theRequest=0xf0360) at
] > 	HTEvtLst.c:791
] >     #10 0x18ff4 in main (argc=25, argv=0xeffff4ec) at
] > 	RobotMain.c:594
] >
] > The command line arguments are:
] >
] >     -q -n -ss -nopipe -cache -cache_size 48 -cacheroot
] /usr/tmp/w3c-cache -r $HOME/raw/webbot.conf -prefix http -depth
] 256 -exclude
] '\.gz$|\.tar$|\.tgz$|\.bz2$|\.Z$|\.zip$|\.ZIP$|\.exe$|\.EXE$|\.ps$
] |\.PS$|\.doc$|\.DOC$|\.pdf$|\.PDF$|\.xplot$|\.tiff$|\.tif$|\.TIF$|
] \.java$|\.JAVA$|\.c$|\.h$|\.txt$|\.ppt$|\.PPT$|\.qt$|\.mov$|\.bin$
] |\.sh$|\.avi$|\.AVI$|\.mpg$|\.MPG$|\.mpeg$|\.MPEG$|\.au$|\.wav$|\.
] WAV$' -img -check
] '\.gif$|\.GIF$|\.png$|\.PNG$|\.jpeg$|\.JPEG$|\.jpg$|\.JPG$'
] -redir -referer /usr/tmp/ImageMapping.raw http://www.cc.gatech.edu/
] >
] > I would really like to make this work, but I don't know
] anything about the
] > internals of libwww.
Received on Thursday, 22 February 2001 16:10:05 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:38 GMT