W3C home > Mailing lists > Public > www-lib@w3.org > October to December 2002

Just want to get a html and do some parsing

From: qiufeng <coolqiufeng@hotmail.com>
Date: Wed, 30 Oct 2002 10:52:05 +0800
To: <www-lib@w3.org>
Message-ID: <OE110wuEArJ6X1f5Rkl00009fe6@hotmail.com>
Hello everyone,
  My task is get a html by providing a URL. Then do some parsing. In a word, I want to write a "void fetch(const char * url, char * result)" function, then I can call it whenever I want to get the result from the html. I modified the showtext.c(is there any other sample more fit my task?) and change the main to a usually function. But I met some trouble when the function return. It just exit the whole application. I have read some documents in www.w3c.org. But I can not find anything helpful(I really don't knwo what they are talking about, I need something basic).

#include "WWWLib.h"
#include "WWWInit.h"
#include "WWWHTML.h"

#include "fetch.h"

PRIVATE int printer (const char * fmt, va_list pArgs)
{
    return (vfprintf(stdout, fmt, pArgs));
}

PRIVATE int tracer (const char * fmt, va_list pArgs)
{
    return (vfprintf(stderr, fmt, pArgs));
}

PRIVATE int terminate_handler (HTRequest * request, HTResponse * response,
          void * param, int status) 
{
    /* We are done with this request */
    HTRequest_delete(request);

    /* Terminate libwww */
    HTProfile_delete();

    return 0; // this also hang my application. :(
//    exit(0);
}

PRIVATE void addText (HText * text, const char * buf, int len)
{
    if (buf) fwrite(buf, 1, len, stdout);
}

void
fetch (const char * url)
{
    char * uri = NULL;

    /* Create a new premptive client */
    HTProfile_newHTMLNoCacheClient ("Fetch", "1.0");

    /* Need our own trace and print functions */
    HTPrint_setCallback(printer);
    HTTrace_setCallback(tracer);

    /* Set trace messages and alert messages */
#if 0
    HTSetTraceMessageMask("sop");
#endif

    /* Add our own termination filter */
    HTNet_addAfter(terminate_handler, NULL, NULL, HT_ALL, HT_FILTER_LAST);

    /*
    ** Register our HTML element handler. We don't actually create a HText
    ** object as this is not needed. We only register the specific link
    ** callback.
    */
    HText_registerTextCallback(addText);

    /* Setup a timeout on the request for 15 secs */
    HTHost_setEventTimeout(150);

    uri = HTParse(url, NULL, PARSE_ALL);

    if (uri) {
 HTRequest * request = NULL;
 HTAnchor * anchor = NULL;
 BOOL status = NO;

 /* Create a request */
 request = HTRequest_new();

 /* Get an anchor object for the URI */
 anchor = HTAnchor_findAddress(uri);

 /* Issue the GET and store the result in a chunk */
 status = HTLoadAnchor(anchor, request);

 /* Go into the event loop... */
 if (status == YES) HTEventList_loop(request);
    }
 return;
}

Received on Tuesday, 29 October 2002 21:53:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:43 GMT