W3C home > Mailing lists > Public > www-lib@w3.org > January to March 2004

RE: wwwlib parsing with own server/client implementation

From: Tim Serong <tim.serong@conceiva.com>
Date: Tue, 10 Feb 2004 09:49:14 +1100
Message-ID: <8501919721C3DE4C81BCA22846B087210A0DD3@lazarus.conceiva.com>
To: "Ceri Coburn" <ceri@first4internet.co.uk>, <www-lib@w3.org>

Hi,

A very similar request came up several months ago, to use libwww for
parsing local files.  Below is what I suggested then, the second example
of which can be used to parse a char *.  I can't help with parsing
headers manually...  This will probably take some digging.

The simplest thing to do is supply a file URL (something like
file:///foo or file:///c:/foo.txt on Windows) for the Request, rather
than an HTTP URL.  libwww should then read the file from disk.

Alternately, you can hack up something like this (please excuse the C++
style):

    // Declare HTStream, so you can write to it directly
  typedef struct _HTStream
  {
    HTStreamClass * isa;
  } HTStream;

  ...

  HText_registerLinkCallback(myFoundLink);
    // register any other required callbacks here
  HTRequest * r = HTRequest_new();
    // this base URL will be used for resolving links
    // in the file being parsed
  HTRequest_setAnchor(r, HTAnchor_findAddress("http://baseurl/"));
  HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0);
  FILE * fp = fopen("the file", "rb");
  char buf[4096];
  while (!feof(fp))
  {
    size_t bytes = fread(buf, 1, 4096, fp);
    (*parseStream->isa->put_block)(parseStream, buf, bytes);
  }
  fclose(fp);
  (*parseStream->isa->_free)(parseStream);
  HTRequest_delete(r);

Using the above method, you should not even need to initialize the
library itself, but you'll have to free some things at the end manually
if you don't, at the very least:

  HTAnchor_deleteAll(0);
  HTAtom_deleteAll();

If you want to parse another file with a different base URL, but without
creating a new request object each time, free the stream, change the
anchor, create the stream again, then write data to it via put_block:

  (*parseStream->isa->_free)(parseStream);
  HTRequest_setAnchor(r, HTAnchor_findAddress("http://somethingelse/"));
  HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0);
  (*parseStream->isa->put_block)(...);

Using the above method on Windows, I only had to link against wwwcore,
wwwdll, wwwhtml and wwwutils, rather than all the libraries.  There may
of course be more elegant solutions...

Regards,

Tim Serong
-- 
tim.serong@conceiva.com
http://www.conceiva.com

> -----Original Message-----
> From: Ceri Coburn [mailto:ceri@first4internet.co.uk]
> Sent: Tuesday, 10 February 2004 02:32
> To: www-lib@w3.org
> Subject: wwwlib parsing with own server/client implementation
> 
> 
> 
> Hi,
> 
> I would like to use the wwwlib in my application only for parsing.  I
> have written my own server implementation for transport.  Is 
> there a way
> I can use the wwwlib to parse the HTTP header and HTML for a char*
> within my application?
> 
> Thanks
> Ceri
> 
> 
> ______________________________________________________________
> __________
> This email has been scanned for all viruses by the MessageLabs Email
> Security System. For more information on a proactive email security
> service working around the clock, around the globe, visit
> http://www.messagelabs.com
> ______________________________________________________________
> __________
> 
> 
Received on Monday, 9 February 2004 17:40:25 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 18 February 2014 13:20:04 UTC