W3C home > Mailing lists > Public > www-lib@w3.org > October to December 1998

Re: Getting started

From: Henrik Frystyk Nielsen <frystyk@w3.org>
Date: Fri, 20 Nov 1998 15:01:12 -0500
Message-Id: <3.0.5.32.19981120150112.02ed8760@localhost>
To: Howard Rubin <hrubin@nyx.net>, hrubin@nyx10.nyx.net, www-lib@w3.org
At 17:37 11/19/98 -0500, Howard Rubin wrote:
>I've been poring over the libwww docs for a few days now and aside
>from the occasional glimmer of faint understanding, I'm not making
>much progress. My goal is to write a C subroutine that will take a
>URL parameter, and return a buffer containing plain ascii text for
>the HTML that's at the URL.  I'm thinking I should use HTMLToPlain
>that's in html.c, but beyond that I'm lost as to how to initialize
>the library and  various data types,  and how to receive the plain
>ascii output.

Converting from HTML to ASCII requires an HTML parser. The libwww HTML
parser which is described in the quick guide:

	http://www.w3.org/Library/User/Start.html

As the libwww parser doesn't know about how to present things, it uses the
HText callout API so that the client can present it in the way it likes (in
the case of the libwww robot, we don't present it at all).

The line mode browser in fact tries to present the output in parsed form,
and so this is what you want to use. Try for example calling the line mode
browser (on Unix) like this

	www -n -na http://www.w3.org -o dump

It will give you an ASCII output of the W3C homepage save it in a file
called "dump". If you want it to go other places then this is easily fixed
in the www code.

Henrik
--
Henrik Frystyk Nielsen,
World Wide Web Consortium
http://www.w3.org/People/Frystyk
Received on Friday, 20 November 1998 15:01:32 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 17:59:22 EDT