Greetings, [cc'd to the list] On Tue, Nov 28, 2000 at 11:49:13AM +0800, #VIKRAM BALKRISHNAN NATARAJAN# wrote: > Thanks a lot for your prompt reply. > I wanted to ask you a few fundamental questions before I can start using > JTidy to know that I am on the right track. > > 1: Can JTidy be easily used with my java program to parse and structure HTML > pages. Yes (although it depends on your definition of "easily" -- you have to know a bit of DOM to do this. See my article at <http://lempinen.net:8180/Forum/975361475/> for an introduction. The project documentation at <http://sourceforge.net/docman/?group_id=13153> contains more code fragments. > 2: Other than parsing can JTidy be used to retrieve only the text from say a > web page of www.cnn.com i.e. retrieve the text of a news site article. I would do this *after* parsing: first open a stream from an URL, pass the stream to JTidy and extract the DOM tree. Then, use the DOM to extract the textual contents. Yours, -Sami -- lempinen@iki.fi http://www.iki.fi/lempinen/ ICQ:19002710 ************* apt-get a lifeReceived on Tuesday, 28 November 2000 01:09:24 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:44 GMT