HTML Parser from Bill Moseley on 2001-07-31 (www-lib@w3.org from July to September 2001)

From: Bill Moseley <moseley@hank.org>
Date: Mon, 30 Jul 2001 21:48:27 -0700
To: www-lib@w3.org
Message-Id: <3.0.3.32.20010730214827.024a0590@pop3.hank.org>

Hello,

I'm trying to replace the HTML parser that's coded into the swish-e search
engine.  I've replaced swish's built-in XML parser with James Clark's Expat
library -- it was perfect for our needs.

So, now I'm looking for something similar to Expat for simple HTML parsing.
 For swish, we need to extract text in a title, in the body, and in meta
tags -- and also know what text is <b> or <em>.  Something that is under
GPL, quite portable and builds without much work, and easy to embed in an
application (as Expat was).

Will the HTML parser in www-lib work for me?  If so, can anyone point to
any examples using the code?  I'll be parsing in-memory documents for the
most part.

Thanks very much,


Bill Moseley
mailto:moseley@hank.org

Received on Tuesday, 31 July 2001 00:48:33 UTC