- From: Tim Serong <tim.serong@conceiva.com>
- Date: Tue, 5 Aug 2003 09:47:59 +1000
- To: "Subramanyam Mallela" <mallela@parc.com>, <www-lib@w3.org>
Hi, The simplest thing to do is supply a file URL (something like file:///foo or file:///c:/foo.txt on Windows) for the Request, rather than an HTTP URL. libwww should then read the file from disk. Alternately, you can hack up something like this (please excuse the C++ style): // Declare HTStream, so you can write to it directly typedef struct _HTStream { HTStreamClass * isa; } HTStream; ... HText_registerLinkCallback(myFoundLink); // register any other required callbacks here HTRequest * r = HTRequest_new(); // this base URL will be used for resolving links // in the file being parsed HTRequest_setAnchor(r, HTAnchor_findAddress("http://baseurl/")); HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0); FILE * fp = fopen("the file", "rb"); char buf[4096]; while (!feof(fp)) { size_t bytes = fread(buf, 1, 4096, fp); (*parseStream->isa->put_block)(parseStream, buf, bytes); } fclose(fp); (*parseStream->isa->_free)(parseStream); HTRequest_delete(r); Using the above method, you should not even need to initialize the library itself, but you'll have to free some things at the end manually if you don't, at the very least: HTAnchor_deleteAll(0); HTAtom_deleteAll(); If you want to parse another file with a different base URL, but without creating a new request object each time, free the stream, change the anchor, create the stream again, then write data to it via put_block: (*parseStream->isa->_free)(parseStream); HTRequest_setAnchor(r, HTAnchor_findAddress("http://somethingelse/")); HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0); (*parseStream->isa->put_block)(...); Using the above method on Windows, I only had to link against wwwcore, wwwdll, wwwhtml and wwwutils, rather than all the libraries. There may of course be more elegant solutions... Regards, Tim Serong -- tim.serong@conceiva.com http://www.conceiva.com > -----Original Message----- > From: Subramanyam Mallela [mailto:mallela@parc.com] > Sent: Tuesday, 5 August 2003 07:30 > To: www-lib@w3.org > Subject: Parsing local html files > > > > > Hi > how can I use the libwww HTML parser for > parsing local files on the disk. > I don't need to download and use rest of the > code ? > Is there any example code for this. > > Thanks for any help > Manyam > >
Received on Monday, 4 August 2003 19:44:23 UTC