HTML Parser

Hello,

I'm trying to replace the HTML parser that's coded into the swish-e search
engine.  I've replaced swish's built-in XML parser with James Clark's Expat
library -- it was perfect for our needs.

So, now I'm looking for something similar to Expat for simple HTML parsing.
 For swish, we need to extract text in a title, in the body, and in meta
tags -- and also know what text is <b> or <em>.  Something that is under
GPL, quite portable and builds without much work, and easy to embed in an
application (as Expat was).

Will the HTML parser in www-lib work for me?  If so, can anyone point to
any examples using the code?  I'll be parsing in-memory documents for the
most part.

Thanks very much,


Bill Moseley
mailto:moseley@hank.org

Received on Tuesday, 31 July 2001 00:48:33 UTC