- From: Simon St.Laurent <simonstl@simonstl.com>
- Date: Mon, 24 Jul 2000 17:02:46 -0400
- To: Christian Stone <chris.stone@virgin.net>, www-html@w3.org
At 09:53 PM 7/24/00 +0100, Christian Stone wrote: >Does anybody out there in the ether have any suggestions about where I >can get some information on how to use the HTML parser in JAVA. > >I am trying to parse an HTML page and then be able to iterate over the >parse tree to extract all the <a tags to create a table of links. I don't know how much documentation is included, but David Brownell has a tool that lets you use the Java Swing HTML parser to generate XML-parser-like SAX events, which would at least get you into a well-documented parsing environment. See: http://home.pacbell.net/david-b/xml/ It's in the SAX2 Utilities package. Information on the SAX2 API is at: http://www.megginson.com/SAX/ You could collect all the a elements and their attributes in the StartElement method of your ContentHandler. I hope that helps... Simon St.Laurent XML Elements of Style / XML: A Primer, 2nd Ed. http://www.simonstl.com - XML essays and books
Received on Monday, 24 July 2000 17:00:02 UTC