W3C home > Mailing lists > Public > www-lib@w3.org > July to September 1999

Re: Robot crash

From: Guy Ferran <guy.ferran@ardentsoftware.fr>
Date: Fri, 30 Jul 1999 09:37:14 +0200
Message-ID: <37A1562A.703AA2F0@ardentsoftware.fr>
To: John Punin <puninj@cs.rpi.edu>
CC: www-lib@w3.org

John,

Your proposal does not really solve my problem, since my goal is to
visit more than the prefixed sites you suggest.

I wonder if it is really due to the number of sites i visited, since the
crash occurs rather rapidly.

By the way, does it mean that webbot has been designed just to support a
small set of visiting sites?

Is it just a matter of memory consumption, which could then be solved by
a kind of memory-map mechanism to swap the memory to a file, or is the
problem more fundamental ?

Besides, i do not understand your suggestion about "robot.txt". I
thought "robot.txt" can only be defined on server sites, and thus webbot
which act s as a client can just rely on his presence. That's why i've
put these sites explicitly in the -exclude clause.

Thanks,

Guy.

PS: I tried "purify", to check memory at runtime,  but it seems the
version I have from Rational (solaris 2.7) does not support the
libraries generated by gcc.



> 
> John Punin a écrit :
> > >
> > Hi Guy
> > You can run out of memory if webbot is "visiting" other web sites besides
> > xmltree. I recommend the following:
> > 1)  -prefix http://www.xmltree.com/
> > 2) The initial URL http://www.xmltree.com/ (use slash at the end)
> > 3) use the flag -redir
> > 4) write a robots.txt to exclude directories: /ArchiveBrowser/|/History/|/member/|/team/|
> >
> > Best wishes
> > John Punin


---
Here is my initial problem:

HTTChunk.c:55 Chunk decoder received illigal chunk size: `'

Program received signal SIGABRT, Aborted.
0xff196870 in _libc_kill () from /usr/lib/libc.so.1
(gdb) where
#0  0xff196870 in _libc_kill () from /usr/lib/libc.so.1
#1  0xff1392e0 in abort () from /usr/lib/libc.so.1
#2  0x6c1c4 in HTDebugBreak (file=0xffbeeea8 "", line=55, 
    fmt=0x782f0 "Chunk decoder received illigal chunk size: `%s'\n")
    at HTTrace.c:108
#3  0x3c958 in HTChunkDecode_header (me=0x87d5e8) at HTTChunk.c:55
#4  0x3ca08 in HTChunkDecode_block (me=0x87d5e8, 
    b=0x167331 "      </TD>\n      <TD WIDTH=\"50%\" BGCOLOR=\"#3300CC\"
VALIGN=\"TOP\">\n         <FONT
COLOR=\"#FFFFFF\"><B>&nbsp;&nbsp;</B></FONT>\n      </TD>\n   </TR>\n  
<TR>\n      <TD WIDTH=\"50%\" BGCOLOR=\"#3300CC\" VALIGN="..., l=1828)
at HTTChunk.c:78
#5  0x43130 in HTMIME_put_block (me=0x3a1ec0, 
    b=0x1666e0 "HTTP/1.1 200 OK\r\nDate: Thu, 29 Jul 1999 16:33:49
GMT\r\nServer: Apache/1.3.6 (Unix)\r\nTransfer-Encoding:
chunked\r\nContent-Type:
text/html\r\n\r\nfe7\r\n\n<HTML>\n<HEAD>\n\t<TITLE>UCI Electronic
Phonebook - </TITLE"..., l=4981) at HTMIME.c:443
#6  0x3e80c in HTTPStatus_put_block (me=0x669428, 
    b=0x1666e0 "HTTP/1.1 200 OK\r\nDate: Thu, 29 Jul 1999 16:33:49
GMT\r\nServer: Apache/1.3.6 (Unix)\r\nTransfer-Encoding:
chunked\r\nContent-Type:
text/html\r\n\r\nfe7\r\n\n<HTML>\n<HEAD>\n\t<TITLE>UCI Electronic
Phonebook - </TITLE"..., l=4981) at HTTP.c:853
#7  0x559b0 in HTReader_read (me=0x1666c8) at HTReader.c:201
#8  0x5db64 in HTHost_read (host=0x236fd8, net=0x6693c8) at
HTHost.c:1632
---Type <return> to continue, or q <return> to quit---
#9  0x3f2dc in HTTPEvent (soc=6722216, pVoid=0x668678,
type=HTEvent_READ)
    at HTTP.c:1230
#10 0x5b5e4 in HostEvent (soc=29, pVoid=0x94800, type=HTEvent_READ)
    at HTHost.c:195
#11 0x2e160 in EventOrder_executeAndDelete () at HTEvtLst.c:321
#12 0x2eb78 in HTEventList_loop (theRequest=0x0) at HTEvtLst.c:759
#13 0x29b7c in main (argc=638968, argv=0xffbef644) at RobotMain.c:779


---
Here is my webbot command:


run -q -ss -n -depth 99 \
-exclude
'/ArchiveBrowser/|/History/|/member/|/team/|\.gz$|\.tar$|\.tgz$|\.Z$|\.zip$|\.ZIP$|\.exe$|\.EXE$|\.ps$|\.doc$|\.pdf$|\.xplot$|\.java$|\.c$|\.h$|\.ppt$|\.gif$|\.GIF$|\.tiff$|\.png$|\.PNG$|\.jpeg$|\.jpg$|\.JPE$'
\
-prefix http:// \
-l robot2-log-clf.txt \
-alt robot2-log-alt.txt \
-hit robot2-log-hit.txt \
-rellog robot2-log-link-relations.txt -relation stylesheet \
-lm robot2-log-lastmodified.txt \
-title robot2-log-title.txt \
-referer robot2-log-referer.txt \
-negotiated robot2-log-negotiated.txt \
-404 robot2-log-notfound.txt \
-reject robot2-log-reject.txt \
-format robot2-log-format.txt \
-charset robot2-log-charset.txt  \
-cache \
-timeout 60 \
http://www.xmltree.com
-- 
Guy Ferran
e-mail: guy.ferran@ardentsoftware.fr
tel: (33) 01 30 84 77 77
fax: (33) 01 30 84 77 90
Received on Friday, 30 July 1999 03:36:56 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:15:17 UTC