Re: Is amaya suitable for use as a html parsing library? from Corne Beerse on 1998-11-16 (www-amaya@w3.org from October to December 1998)

From: Corne Beerse <beerse@ats.nld.alcatel.nl>
Date: Mon, 16 Nov 1998 09:24:16 +0100
To: Howard Rubin <hrubin@nyx.net>
CC: hrubin@disc.com, www-amaya@w3.org
Message-ID: <364FE130.328D@ats.nld.alcatel.nl>

Howard Rubin wrote:
> I need to extract text from HTML documents and do this from
> a platform portable C program. I've been all over the web --
> dejanews, yahoo etc., and the closest thing I've found is libwww.
> However, I notice in libwww (http://www.w3.org/Library/User/Start.html)
> that libwww isn't recommended for use as an HTML parser. It
> recommends Amaya as a full HTML parser.
You might try to write a Perl (or sed/awk) script for the purpose.

> 
> Is there some part of the Amaya source code that would be suitable
> for extracting the text from HTML documents from a C program?
> Any tips, hints, etc would be greatly appreciated.
I should have a look at the print code. There is a special executable in
the bin directory. You should strip all the postscript code it generates
and there you have your text.

CB


-- 
Is reading in the bathroom considered Multi-Tasking?
Corne' Beerse					| Alcatel Telecom Nederland
mailto:beerse@ats.nld.alcatel.nl		| Postbus 3292
talkto:+31(70)3079108 faxto:+31(70)3079191	| NL-2280 GG  Rijswijk

Received on Monday, 16 November 1998 03:25:01 UTC