- From: Dan Connolly <connolly@w3.org>
- Date: Mon, 24 Jul 2000 16:48:34 -0500
- To: "Simon St.Laurent" <simonstl@simonstl.com>
- CC: Christian Stone <chris.stone@virgin.net>, www-html@w3.org
"Simon St.Laurent" wrote: > > At 09:53 PM 7/24/00 +0100, Christian Stone wrote: > >Does anybody out there in the ether have any suggestions about where I > >can get some information on how to use the HTML parser in JAVA. > > > >I am trying to parse an HTML page and then be able to iterate over the > >parse tree to extract all the <a tags to create a table of links. > > I don't know how much documentation is included, but David Brownell has a > tool that lets you use the Java Swing HTML parser to generate > XML-parser-like SAX events, which would at least get you into a > well-documented parsing environment. A similar approach is to use the Tidy Java Bean. It seems to have reasonable documentation and it seems to be actively maintained: Java HTML Tidy Updated 22 Jul 2000 http://www3.sympatico.ca/ac.quick/jtidy.html <- Andy Quick http://www3.sympatico.ca/ac.quick/ <- HTML Tidy http://www.w3.org/People/Raggett/tidy/ <- HTML Home page http://www.w3.org/MarkUp/ > > See: > http://home.pacbell.net/david-b/xml/ > > It's in the SAX2 Utilities package. > > Information on the SAX2 API is at: > http://www.megginson.com/SAX/ > > You could collect all the a elements and their attributes in the > StartElement method of your ContentHandler. > > I hope that helps... > Simon St.Laurent > XML Elements of Style / XML: A Primer, 2nd Ed. > http://www.simonstl.com - XML essays and books -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Monday, 24 July 2000 17:49:35 UTC