RE: wwwlib parsing with own server/client implementation from Ceri Coburn on 2004-02-10 (www-lib@w3.org from January to March 2004)

From: Ceri Coburn <ceri@first4internet.co.uk>
Date: Tue, 10 Feb 2004 10:51:03 -0000
To: <www-lib@w3.org>
Message-ID: <AEDD1773F0718C4783A4A3E526C4A0561599AE@mail.first4internet.co.uk>
Hi,

Many thanks.  That's great.  I will try this.  Just need to find a way
to do the same with HTTP headers and them I will be really happy.  I
imagine writing my own parser for this would be a very tedious task.

Thanks again
Ceri

-----Original Message-----
From: Tim Serong [mailto:tim.serong@conceiva.com] 
Sent: 09 February 2004 22:49
To: Ceri Coburn; www-lib@w3.org
Subject: RE: wwwlib parsing with own server/client implementation

Hi,

A very similar request came up several months ago, to use libwww for
parsing local files.  Below is what I suggested then, the second example
of which can be used to parse a char *.  I can't help with parsing
headers manually...  This will probably take some digging.

The simplest thing to do is supply a file URL (something like
file:///foo or file:///c:/foo.txt on Windows) for the Request, rather
than an HTTP URL.  libwww should then read the file from disk.

Alternately, you can hack up something like this (please excuse the C++
style):

    // Declare HTStream, so you can write to it directly
  typedef struct _HTStream
  {
    HTStreamClass * isa;
  } HTStream;

  ...

  HText_registerLinkCallback(myFoundLink);
    // register any other required callbacks here
  HTRequest * r = HTRequest_new();
    // this base URL will be used for resolving links
    // in the file being parsed
  HTRequest_setAnchor(r, HTAnchor_findAddress("http://baseurl/"));
  HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0);
  FILE * fp = fopen("the file", "rb");
  char buf[4096];
  while (!feof(fp))
  {
    size_t bytes = fread(buf, 1, 4096, fp);
    (*parseStream->isa->put_block)(parseStream, buf, bytes);
  }
  fclose(fp);
  (*parseStream->isa->_free)(parseStream);
  HTRequest_delete(r);

Using the above method, you should not even need to initialize the
library itself, but you'll have to free some things at the end manually
if you don't, at the very least:

  HTAnchor_deleteAll(0);
  HTAtom_deleteAll();

If you want to parse another file with a different base URL, but without
creating a new request object each time, free the stream, change the
anchor, create the stream again, then write data to it via put_block:

  (*parseStream->isa->_free)(parseStream);
  HTRequest_setAnchor(r, HTAnchor_findAddress("http://somethingelse/"));
  HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0);
  (*parseStream->isa->put_block)(...);

Using the above method on Windows, I only had to link against wwwcore,
wwwdll, wwwhtml and wwwutils, rather than all the libraries.  There may
of course be more elegant solutions...

Regards,

Tim Serong
-- 
tim.serong@conceiva.com
http://www.conceiva.com

> -----Original Message-----
> From: Ceri Coburn [mailto:ceri@first4internet.co.uk]
> Sent: Tuesday, 10 February 2004 02:32
> To: www-lib@w3.org
> Subject: wwwlib parsing with own server/client implementation
> 
> 
> 
> Hi,
> 
> I would like to use the wwwlib in my application only for parsing.  I
> have written my own server implementation for transport.  Is 
> there a way
> I can use the wwwlib to parse the HTTP header and HTML for a char*
> within my application?
> 
> Thanks
> Ceri
> 
> 
> ______________________________________________________________
> __________
> This email has been scanned for all viruses by the MessageLabs Email
> Security System. For more information on a proactive email security
> service working around the clock, around the globe, visit
> http://www.messagelabs.com
> ______________________________________________________________
> __________
> 
> 

________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
http://www.messagelabs.com
________________________________________________________________________



________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
http://www.messagelabs.com
________________________________________________________________________
Received on Tuesday, 10 February 2004 05:53:49 UTC