- From: Tim Serong <tim.serong@conceiva.com>
- Date: Tue, 10 Feb 2004 09:49:14 +1100
- To: "Ceri Coburn" <ceri@first4internet.co.uk>, <www-lib@w3.org>
Hi, A very similar request came up several months ago, to use libwww for parsing local files. Below is what I suggested then, the second example of which can be used to parse a char *. I can't help with parsing headers manually... This will probably take some digging. The simplest thing to do is supply a file URL (something like file:///foo or file:///c:/foo.txt on Windows) for the Request, rather than an HTTP URL. libwww should then read the file from disk. Alternately, you can hack up something like this (please excuse the C++ style): // Declare HTStream, so you can write to it directly typedef struct _HTStream { HTStreamClass * isa; } HTStream; ... HText_registerLinkCallback(myFoundLink); // register any other required callbacks here HTRequest * r = HTRequest_new(); // this base URL will be used for resolving links // in the file being parsed HTRequest_setAnchor(r, HTAnchor_findAddress("http://baseurl/")); HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0); FILE * fp = fopen("the file", "rb"); char buf[4096]; while (!feof(fp)) { size_t bytes = fread(buf, 1, 4096, fp); (*parseStream->isa->put_block)(parseStream, buf, bytes); } fclose(fp); (*parseStream->isa->_free)(parseStream); HTRequest_delete(r); Using the above method, you should not even need to initialize the library itself, but you'll have to free some things at the end manually if you don't, at the very least: HTAnchor_deleteAll(0); HTAtom_deleteAll(); If you want to parse another file with a different base URL, but without creating a new request object each time, free the stream, change the anchor, create the stream again, then write data to it via put_block: (*parseStream->isa->_free)(parseStream); HTRequest_setAnchor(r, HTAnchor_findAddress("http://somethingelse/")); HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0); (*parseStream->isa->put_block)(...); Using the above method on Windows, I only had to link against wwwcore, wwwdll, wwwhtml and wwwutils, rather than all the libraries. There may of course be more elegant solutions... Regards, Tim Serong -- tim.serong@conceiva.com http://www.conceiva.com > -----Original Message----- > From: Ceri Coburn [mailto:ceri@first4internet.co.uk] > Sent: Tuesday, 10 February 2004 02:32 > To: www-lib@w3.org > Subject: wwwlib parsing with own server/client implementation > > > > Hi, > > I would like to use the wwwlib in my application only for parsing. I > have written my own server implementation for transport. Is > there a way > I can use the wwwlib to parse the HTTP header and HTML for a char* > within my application? > > Thanks > Ceri > > > ______________________________________________________________ > __________ > This email has been scanned for all viruses by the MessageLabs Email > Security System. For more information on a proactive email security > service working around the clock, around the globe, visit > http://www.messagelabs.com > ______________________________________________________________ > __________ > >
Received on Monday, 9 February 2004 17:40:25 UTC