- From: Guy Ferran <guy.ferran@ardentsoftware.fr>
- Date: Fri, 30 Jul 1999 09:37:14 +0200
- To: John Punin <puninj@cs.rpi.edu>
- CC: www-lib@w3.org
John, Your proposal does not really solve my problem, since my goal is to visit more than the prefixed sites you suggest. I wonder if it is really due to the number of sites i visited, since the crash occurs rather rapidly. By the way, does it mean that webbot has been designed just to support a small set of visiting sites? Is it just a matter of memory consumption, which could then be solved by a kind of memory-map mechanism to swap the memory to a file, or is the problem more fundamental ? Besides, i do not understand your suggestion about "robot.txt". I thought "robot.txt" can only be defined on server sites, and thus webbot which act s as a client can just rely on his presence. That's why i've put these sites explicitly in the -exclude clause. Thanks, Guy. PS: I tried "purify", to check memory at runtime, but it seems the version I have from Rational (solaris 2.7) does not support the libraries generated by gcc. > > John Punin a écrit : > > > > > Hi Guy > > You can run out of memory if webbot is "visiting" other web sites besides > > xmltree. I recommend the following: > > 1) -prefix http://www.xmltree.com/ > > 2) The initial URL http://www.xmltree.com/ (use slash at the end) > > 3) use the flag -redir > > 4) write a robots.txt to exclude directories: /ArchiveBrowser/|/History/|/member/|/team/| > > > > Best wishes > > John Punin --- Here is my initial problem: HTTChunk.c:55 Chunk decoder received illigal chunk size: `' Program received signal SIGABRT, Aborted. 0xff196870 in _libc_kill () from /usr/lib/libc.so.1 (gdb) where #0 0xff196870 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff1392e0 in abort () from /usr/lib/libc.so.1 #2 0x6c1c4 in HTDebugBreak (file=0xffbeeea8 "", line=55, fmt=0x782f0 "Chunk decoder received illigal chunk size: `%s'\n") at HTTrace.c:108 #3 0x3c958 in HTChunkDecode_header (me=0x87d5e8) at HTTChunk.c:55 #4 0x3ca08 in HTChunkDecode_block (me=0x87d5e8, b=0x167331 " </TD>\n <TD WIDTH=\"50%\" BGCOLOR=\"#3300CC\" VALIGN=\"TOP\">\n <FONT COLOR=\"#FFFFFF\"><B> </B></FONT>\n </TD>\n </TR>\n <TR>\n <TD WIDTH=\"50%\" BGCOLOR=\"#3300CC\" VALIGN="..., l=1828) at HTTChunk.c:78 #5 0x43130 in HTMIME_put_block (me=0x3a1ec0, b=0x1666e0 "HTTP/1.1 200 OK\r\nDate: Thu, 29 Jul 1999 16:33:49 GMT\r\nServer: Apache/1.3.6 (Unix)\r\nTransfer-Encoding: chunked\r\nContent-Type: text/html\r\n\r\nfe7\r\n\n<HTML>\n<HEAD>\n\t<TITLE>UCI Electronic Phonebook - </TITLE"..., l=4981) at HTMIME.c:443 #6 0x3e80c in HTTPStatus_put_block (me=0x669428, b=0x1666e0 "HTTP/1.1 200 OK\r\nDate: Thu, 29 Jul 1999 16:33:49 GMT\r\nServer: Apache/1.3.6 (Unix)\r\nTransfer-Encoding: chunked\r\nContent-Type: text/html\r\n\r\nfe7\r\n\n<HTML>\n<HEAD>\n\t<TITLE>UCI Electronic Phonebook - </TITLE"..., l=4981) at HTTP.c:853 #7 0x559b0 in HTReader_read (me=0x1666c8) at HTReader.c:201 #8 0x5db64 in HTHost_read (host=0x236fd8, net=0x6693c8) at HTHost.c:1632 ---Type <return> to continue, or q <return> to quit--- #9 0x3f2dc in HTTPEvent (soc=6722216, pVoid=0x668678, type=HTEvent_READ) at HTTP.c:1230 #10 0x5b5e4 in HostEvent (soc=29, pVoid=0x94800, type=HTEvent_READ) at HTHost.c:195 #11 0x2e160 in EventOrder_executeAndDelete () at HTEvtLst.c:321 #12 0x2eb78 in HTEventList_loop (theRequest=0x0) at HTEvtLst.c:759 #13 0x29b7c in main (argc=638968, argv=0xffbef644) at RobotMain.c:779 --- Here is my webbot command: run -q -ss -n -depth 99 \ -exclude '/ArchiveBrowser/|/History/|/member/|/team/|\.gz$|\.tar$|\.tgz$|\.Z$|\.zip$|\.ZIP$|\.exe$|\.EXE$|\.ps$|\.doc$|\.pdf$|\.xplot$|\.java$|\.c$|\.h$|\.ppt$|\.gif$|\.GIF$|\.tiff$|\.png$|\.PNG$|\.jpeg$|\.jpg$|\.JPE$' \ -prefix http:// \ -l robot2-log-clf.txt \ -alt robot2-log-alt.txt \ -hit robot2-log-hit.txt \ -rellog robot2-log-link-relations.txt -relation stylesheet \ -lm robot2-log-lastmodified.txt \ -title robot2-log-title.txt \ -referer robot2-log-referer.txt \ -negotiated robot2-log-negotiated.txt \ -404 robot2-log-notfound.txt \ -reject robot2-log-reject.txt \ -format robot2-log-format.txt \ -charset robot2-log-charset.txt \ -cache \ -timeout 60 \ http://www.xmltree.com -- Guy Ferran e-mail: guy.ferran@ardentsoftware.fr tel: (33) 01 30 84 77 77 fax: (33) 01 30 84 77 90
Received on Friday, 30 July 1999 03:36:56 UTC