- From: Henrik Frystyk Nielsen <frystyk@w3.org>
- Date: Fri, 20 Nov 1998 15:01:12 -0500
- To: Howard Rubin <hrubin@nyx.net>, hrubin@nyx10.nyx.net, www-lib@w3.org
At 17:37 11/19/98 -0500, Howard Rubin wrote: >I've been poring over the libwww docs for a few days now and aside >from the occasional glimmer of faint understanding, I'm not making >much progress. My goal is to write a C subroutine that will take a >URL parameter, and return a buffer containing plain ascii text for >the HTML that's at the URL. I'm thinking I should use HTMLToPlain >that's in html.c, but beyond that I'm lost as to how to initialize >the library and various data types, and how to receive the plain >ascii output. Converting from HTML to ASCII requires an HTML parser. The libwww HTML parser which is described in the quick guide: http://www.w3.org/Library/User/Start.html As the libwww parser doesn't know about how to present things, it uses the HText callout API so that the client can present it in the way it likes (in the case of the libwww robot, we don't present it at all). The line mode browser in fact tries to present the output in parsed form, and so this is what you want to use. Try for example calling the line mode browser (on Unix) like this www -n -na http://www.w3.org -o dump It will give you an ASCII output of the W3C homepage save it in a file called "dump". If you want it to go other places then this is easily fixed in the www code. Henrik -- Henrik Frystyk Nielsen, World Wide Web Consortium http://www.w3.org/People/Frystyk
Received on Friday, 20 November 1998 15:01:32 UTC