- From: Filippo Menczer <fil@cs.ucsd.edu>
- Date: Sat, 30 May 1998 20:47:34 -0400 (EDT)
- To: www-lib@w3.org
- Cc: fil@cs.ucsd.edu
Could someone help me figure out why this simple example is crashing?
Basically I am calling the code in the "chunk" example multiple times
by wrapping a loop around it. The first time it goes fine, the second
causes a segmentation fault.
Configuration:
Linux (on a Pentium II box)
w3c-libwww-5.1m (statically linked)
Here is the code:
#include "WWWLib.h"
#include "WWWHTTP.h"
#include "WWWInit.h"
void mygetchunk(char *url)
{
HTRequest *request = HTRequest_new();
HTChunk *chunk = NULL;
HTProfile_newPreemptiveClient("TestApp", "1.0");
WWWTRACE = SHOW_CORE_TRACE + SHOW_STREAM_TRACE +
SHOW_PROTOCOL_TRACE;
HTRequest_setOutputFormat(request, WWW_SOURCE);
if (url) {
char *cwd = HTGetCurrentDirectoryURL();
char *absolute_url = HTParse(url, cwd, PARSE_ALL);
chunk = HTLoadToChunk(absolute_url, request);
HT_FREE(absolute_url);
HT_FREE(cwd);
printf("%s\n", chunk ? "OK-FIRST-TIME" : "NO DATA");
}
HTRequest_delete(request);
HTProfile_delete();
}
main()
{
mygetchunk("http://www.cs.ucsd.edu/~fil/agents");
mygetchunk("http://www.cs.ucsd.edu/~fil/agents"); /* any URL
here */
}
Here are output and tail of the trace:
% mychunk 2> mychunk.trace
OK-FIRST-TIME
30900 Segmentation fault (core dumped)
%
% tail mychunk.trace
Net Object.. 0x80e0af8 created with hash 2
Net Object.. starting request 0x80cf050 (retry=1) with net object
0x80e0af8
HTTP........ Looking for `http://www.cs.ucsd.edu/~fil/agents'
HTDoConnect. Looking up `www.cs.ucsd.edu'
Host info... REUSING CHANNEL 0x80cf330
Host info... Add Net 0x80e0af8 (request 0x80cf050) to pipe, 2
requests made, 1 requests in pipe, 0 pending
HTHost...... No ActivateRequest callback handler registered
Channel..... Semaphore increased to 1 for channel 0x80cf330
HTTP........ Force flush on preemptive load
StreamStack. Constructing stream stack for text/x-http to */*
%
And here is the execution stack from gdb:
Core was generated by `mychunk'.
Program terminated with signal 11, Segmentation fault.
540 while ((pres =
(HTPresentation*)HTList_nextObject(cur))) {
(gdb) where
#0 0x80511bd in HTStreamStack (rep_in=0x80df588, rep_out=0x80c96f0,
output_stream=0x80dfca8, request=0x80cf050, guess=1) at
HTFormat.c:540
#1 0x806cb0a in HTTPEvent (soc=-1, pVoid=0x80e0b68, type=HTEvent_BEGIN)
at HTTP.c:1026
#2 0x806c7c0 in HTLoadHTTP (soc=-1, request=0x80cf050) at HTTP.c:916
#3 0x8056360 in HTNet_newClient (request=0x80cf050) at HTNet.c:732
#4 0x804cc14 in HTLoad (me=0x80cf050, recursive=0 '\000') at
HTReqMan.c:1575
#5 0x8048ee7 in launch_request (request=0x80cf050, recursive=0 '\000')
at HTAccess.c:75
#6 0x804910b in HTLoadToChunk (
url=0x80dfc80 "http://www.cs.ucsd.edu/~fil/agents",
request=0x80cf050)
at HTAccess.c:183
#7 0x80481c3 in mygetchunk (url=0x80ae197
"http://www.cs.ucsd.edu/~fil/agents")
at mychunk.c:19
#8 0x804824a in main () at mychunk.c:32
#9 0x80480ee in _start ()
(gdb) print *0x80cf050
$1 = 0
Is this the wrong way to go about it? All I really need from the
libwww is a simple way to GET the contents of a long sequence of URLs,
each into memory, sequentially, and without any parsing. My own code
will process the contents between calls to the library. The code
in the command-line-tool and the documentation are beyond my
comprehension and the limited scope of my needs.
(Ideally I would want to get the contents only if the pages are
text/html or text/plain, but that and other things are the next steps.)
Any assistance would be greatly appreciated!
Please cc: my email as I am not a subscriber of the list.
Thanks,
-Fil
====================================================
Filippo Menczer http://www.cs.ucsd.edu/~fil/
fil@cs.ucsd.edu CSE Dept., 0114
Lab: (619) 453-4364 U. C. San Diego
Fax: (619) 534-7029 La Jolla, CA 92093-0114, USA
====================================================
Received on Saturday, 30 May 1998 23:58:57 UTC