W3C home > Mailing lists > Public > www-lib@w3.org > July to September 2001

HTML Parser

From: Bill Moseley <moseley@hank.org>
Date: Mon, 30 Jul 2001 21:48:27 -0700
Message-Id: <>
To: www-lib@w3.org

I'm trying to replace the HTML parser that's coded into the swish-e search
engine.  I've replaced swish's built-in XML parser with James Clark's Expat
library -- it was perfect for our needs.

So, now I'm looking for something similar to Expat for simple HTML parsing.
 For swish, we need to extract text in a title, in the body, and in meta
tags -- and also know what text is <b> or <em>.  Something that is under
GPL, quite portable and builds without much work, and easy to embed in an
application (as Expat was).

Will the HTML parser in www-lib work for me?  If so, can anyone point to
any examples using the code?  I'll be parsing in-memory documents for the
most part.

Thanks very much,

Bill Moseley
Received on Tuesday, 31 July 2001 00:48:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:33:54 UTC