- From: Guy Ferran <guy.ferran@ardentsoftware.fr>
- Date: Fri, 30 Jul 1999 09:37:14 +0200
- To: John Punin <puninj@cs.rpi.edu>
- CC: www-lib@w3.org
John,
Your proposal does not really solve my problem, since my goal is to
visit more than the prefixed sites you suggest.
I wonder if it is really due to the number of sites i visited, since the
crash occurs rather rapidly.
By the way, does it mean that webbot has been designed just to support a
small set of visiting sites?
Is it just a matter of memory consumption, which could then be solved by
a kind of memory-map mechanism to swap the memory to a file, or is the
problem more fundamental ?
Besides, i do not understand your suggestion about "robot.txt". I
thought "robot.txt" can only be defined on server sites, and thus webbot
which act s as a client can just rely on his presence. That's why i've
put these sites explicitly in the -exclude clause.
Thanks,
Guy.
PS: I tried "purify", to check memory at runtime, but it seems the
version I have from Rational (solaris 2.7) does not support the
libraries generated by gcc.
>
> John Punin a écrit :
> > >
> > Hi Guy
> > You can run out of memory if webbot is "visiting" other web sites besides
> > xmltree. I recommend the following:
> > 1) -prefix http://www.xmltree.com/
> > 2) The initial URL http://www.xmltree.com/ (use slash at the end)
> > 3) use the flag -redir
> > 4) write a robots.txt to exclude directories: /ArchiveBrowser/|/History/|/member/|/team/|
> >
> > Best wishes
> > John Punin
---
Here is my initial problem:
HTTChunk.c:55 Chunk decoder received illigal chunk size: `'
Program received signal SIGABRT, Aborted.
0xff196870 in _libc_kill () from /usr/lib/libc.so.1
(gdb) where
#0 0xff196870 in _libc_kill () from /usr/lib/libc.so.1
#1 0xff1392e0 in abort () from /usr/lib/libc.so.1
#2 0x6c1c4 in HTDebugBreak (file=0xffbeeea8 "", line=55,
fmt=0x782f0 "Chunk decoder received illigal chunk size: `%s'\n")
at HTTrace.c:108
#3 0x3c958 in HTChunkDecode_header (me=0x87d5e8) at HTTChunk.c:55
#4 0x3ca08 in HTChunkDecode_block (me=0x87d5e8,
b=0x167331 " </TD>\n <TD WIDTH=\"50%\" BGCOLOR=\"#3300CC\"
VALIGN=\"TOP\">\n <FONT
COLOR=\"#FFFFFF\"><B> </B></FONT>\n </TD>\n </TR>\n
<TR>\n <TD WIDTH=\"50%\" BGCOLOR=\"#3300CC\" VALIGN="..., l=1828)
at HTTChunk.c:78
#5 0x43130 in HTMIME_put_block (me=0x3a1ec0,
b=0x1666e0 "HTTP/1.1 200 OK\r\nDate: Thu, 29 Jul 1999 16:33:49
GMT\r\nServer: Apache/1.3.6 (Unix)\r\nTransfer-Encoding:
chunked\r\nContent-Type:
text/html\r\n\r\nfe7\r\n\n<HTML>\n<HEAD>\n\t<TITLE>UCI Electronic
Phonebook - </TITLE"..., l=4981) at HTMIME.c:443
#6 0x3e80c in HTTPStatus_put_block (me=0x669428,
b=0x1666e0 "HTTP/1.1 200 OK\r\nDate: Thu, 29 Jul 1999 16:33:49
GMT\r\nServer: Apache/1.3.6 (Unix)\r\nTransfer-Encoding:
chunked\r\nContent-Type:
text/html\r\n\r\nfe7\r\n\n<HTML>\n<HEAD>\n\t<TITLE>UCI Electronic
Phonebook - </TITLE"..., l=4981) at HTTP.c:853
#7 0x559b0 in HTReader_read (me=0x1666c8) at HTReader.c:201
#8 0x5db64 in HTHost_read (host=0x236fd8, net=0x6693c8) at
HTHost.c:1632
---Type <return> to continue, or q <return> to quit---
#9 0x3f2dc in HTTPEvent (soc=6722216, pVoid=0x668678,
type=HTEvent_READ)
at HTTP.c:1230
#10 0x5b5e4 in HostEvent (soc=29, pVoid=0x94800, type=HTEvent_READ)
at HTHost.c:195
#11 0x2e160 in EventOrder_executeAndDelete () at HTEvtLst.c:321
#12 0x2eb78 in HTEventList_loop (theRequest=0x0) at HTEvtLst.c:759
#13 0x29b7c in main (argc=638968, argv=0xffbef644) at RobotMain.c:779
---
Here is my webbot command:
run -q -ss -n -depth 99 \
-exclude
'/ArchiveBrowser/|/History/|/member/|/team/|\.gz$|\.tar$|\.tgz$|\.Z$|\.zip$|\.ZIP$|\.exe$|\.EXE$|\.ps$|\.doc$|\.pdf$|\.xplot$|\.java$|\.c$|\.h$|\.ppt$|\.gif$|\.GIF$|\.tiff$|\.png$|\.PNG$|\.jpeg$|\.jpg$|\.JPE$'
\
-prefix http:// \
-l robot2-log-clf.txt \
-alt robot2-log-alt.txt \
-hit robot2-log-hit.txt \
-rellog robot2-log-link-relations.txt -relation stylesheet \
-lm robot2-log-lastmodified.txt \
-title robot2-log-title.txt \
-referer robot2-log-referer.txt \
-negotiated robot2-log-negotiated.txt \
-404 robot2-log-notfound.txt \
-reject robot2-log-reject.txt \
-format robot2-log-format.txt \
-charset robot2-log-charset.txt \
-cache \
-timeout 60 \
http://www.xmltree.com
--
Guy Ferran
e-mail: guy.ferran@ardentsoftware.fr
tel: (33) 01 30 84 77 77
fax: (33) 01 30 84 77 90
Received on Friday, 30 July 1999 03:36:56 UTC