W3C home > Mailing lists > Public > www-html@w3.org > July 2000

Re: HELP!!

From: Simon St.Laurent <simonstl@simonstl.com>
Date: Mon, 24 Jul 2000 17:02:46 -0400
Message-Id: <200007242059.QAA22091@hesketh.net>
To: Christian Stone <chris.stone@virgin.net>, www-html@w3.org
At 09:53 PM 7/24/00 +0100, Christian Stone wrote:
>Does anybody out there in the ether have any suggestions about where I
>can get some information on how to use the HTML parser in JAVA.
>
>I am trying to parse an HTML page and then be able to iterate over the
>parse tree to extract all the <a tags to create a table of links.

I don't know how much documentation is included, but David Brownell has a
tool that lets you use the Java Swing HTML parser to generate
XML-parser-like SAX events, which would at least get you into a
well-documented parsing environment.

See:
http://home.pacbell.net/david-b/xml/

It's in the SAX2 Utilities package.

Information on the SAX2 API is at:
http://www.megginson.com/SAX/

You could collect all the a elements and their attributes in the
StartElement method of your ContentHandler.

I hope that helps...
Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
http://www.simonstl.com - XML essays and books
Received on Monday, 24 July 2000 17:00:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:43 GMT