W3C home > Mailing lists > Public > www-lib@w3.org > July to September 2001

HTML Parser

From: Bill Moseley <moseley@hank.org>
Date: Mon, 30 Jul 2001 21:48:27 -0700
Message-Id: <3.0.3.32.20010730214827.024a0590@pop3.hank.org>
To: www-lib@w3.org
Hello,

I'm trying to replace the HTML parser that's coded into the swish-e search
engine.  I've replaced swish's built-in XML parser with James Clark's Expat
library -- it was perfect for our needs.

So, now I'm looking for something similar to Expat for simple HTML parsing.
 For swish, we need to extract text in a title, in the body, and in meta
tags -- and also know what text is <b> or <em>.  Something that is under
GPL, quite portable and builds without much work, and easy to embed in an
application (as Expat was).

Will the HTML parser in www-lib work for me?  If so, can anyone point to
any examples using the code?  I'll be parsing in-memory documents for the
most part.

Thanks very much,


Bill Moseley
mailto:moseley@hank.org
Received on Tuesday, 31 July 2001 00:48:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:39 GMT