chunk example crashing if iterated from Filippo Menczer on 1998-05-31 (www-lib@w3.org from April to June 1998)

From: Filippo Menczer <fil@cs.ucsd.edu>
Date: Sat, 30 May 1998 20:47:34 -0400 (EDT)
To: www-lib@w3.org
Cc: fil@cs.ucsd.edu
Message-Id: <v02140b03b196407eea64@[137.110.58.121]>
Could someone help me figure out why this simple example is crashing?
Basically I am calling the code in the "chunk" example multiple times
by wrapping a loop around it. The first time it goes fine, the second
causes a segmentation fault.

Configuration:
        Linux (on a Pentium II box)
        w3c-libwww-5.1m (statically linked)

Here is the code:

        #include "WWWLib.h"
        #include "WWWHTTP.h"
        #include "WWWInit.h"

        void mygetchunk(char *url)
        {
            HTRequest *request = HTRequest_new();
            HTChunk *chunk = NULL;

            HTProfile_newPreemptiveClient("TestApp", "1.0");

            WWWTRACE = SHOW_CORE_TRACE + SHOW_STREAM_TRACE +
SHOW_PROTOCOL_TRACE;

            HTRequest_setOutputFormat(request, WWW_SOURCE);
            if (url) {
                char *cwd = HTGetCurrentDirectoryURL();
                char *absolute_url = HTParse(url, cwd, PARSE_ALL);
                chunk = HTLoadToChunk(absolute_url, request);
                HT_FREE(absolute_url);
                HT_FREE(cwd);
                printf("%s\n", chunk ? "OK-FIRST-TIME" : "NO DATA");
            }

            HTRequest_delete(request);
            HTProfile_delete();
        }

        main()
        {
              mygetchunk("http://www.cs.ucsd.edu/~fil/agents");
              mygetchunk("http://www.cs.ucsd.edu/~fil/agents"); /* any URL
here */
        }

Here are output and tail of the trace:

        % mychunk 2> mychunk.trace
        OK-FIRST-TIME
        30900 Segmentation fault (core dumped)
        %
        % tail mychunk.trace
        Net Object.. 0x80e0af8 created with hash 2
        Net Object.. starting request 0x80cf050 (retry=1) with net object
0x80e0af8
        HTTP........ Looking for `http://www.cs.ucsd.edu/~fil/agents'
        HTDoConnect. Looking up `www.cs.ucsd.edu'
        Host info... REUSING CHANNEL 0x80cf330
        Host info... Add Net 0x80e0af8 (request 0x80cf050) to pipe, 2
requests made, 1 requests in pipe, 0 pending
        HTHost...... No ActivateRequest callback handler registered
        Channel..... Semaphore increased to 1 for channel 0x80cf330
        HTTP........ Force flush on preemptive load
        StreamStack. Constructing stream stack for text/x-http to */*
        %

And here is the execution stack from gdb:

        Core was generated by `mychunk'.
        Program terminated with signal 11, Segmentation fault.
        540             while ((pres =
(HTPresentation*)HTList_nextObject(cur))) {
        (gdb) where
        #0  0x80511bd in HTStreamStack (rep_in=0x80df588, rep_out=0x80c96f0,
            output_stream=0x80dfca8, request=0x80cf050, guess=1) at
HTFormat.c:540
        #1  0x806cb0a in HTTPEvent (soc=-1, pVoid=0x80e0b68, type=HTEvent_BEGIN)
            at HTTP.c:1026
        #2  0x806c7c0 in HTLoadHTTP (soc=-1, request=0x80cf050) at HTTP.c:916
        #3  0x8056360 in HTNet_newClient (request=0x80cf050) at HTNet.c:732
        #4  0x804cc14 in HTLoad (me=0x80cf050, recursive=0 '\000') at
HTReqMan.c:1575
        #5  0x8048ee7 in launch_request (request=0x80cf050, recursive=0 '\000')
            at HTAccess.c:75
        #6  0x804910b in HTLoadToChunk (
            url=0x80dfc80 "http://www.cs.ucsd.edu/~fil/agents",
request=0x80cf050)
            at HTAccess.c:183
        #7  0x80481c3 in mygetchunk (url=0x80ae197
"http://www.cs.ucsd.edu/~fil/agents")
            at mychunk.c:19
        #8  0x804824a in main () at mychunk.c:32
        #9  0x80480ee in _start ()
        (gdb) print *0x80cf050
        $1 = 0

Is this the wrong way to go about it? All I really need from the
libwww is a simple way to GET the contents of a long sequence of URLs,
each into memory, sequentially, and without any parsing. My own code
will process the contents between calls to the library. The code
in the command-line-tool and the documentation are beyond my
comprehension and the limited scope of my needs.

(Ideally I would want to get the contents only if the pages are
text/html or text/plain, but that and other things are the next steps.)

Any assistance would be greatly appreciated!
Please cc: my email as I am not a subscriber of the list.
Thanks,
-Fil


====================================================
Filippo Menczer         http://www.cs.ucsd.edu/~fil/
fil@cs.ucsd.edu         CSE Dept., 0114
Lab:  (619) 453-4364    U. C. San Diego
Fax:  (619) 534-7029    La Jolla, CA 92093-0114, USA
====================================================
Received on Saturday, 30 May 1998 23:58:57 UTC