- From: Jurgis <jurgis@lursoft.lv>
- Date: Thu, 20 Jun 2002 17:39:15 +0300
- To: <www-lib@w3.org>
Hi! I'm trying to modify exisiting webbot, i change something in link extraction, i want that this robot be as only downloader and parser. just downloads URL parses it and make 2 files one with URLs another plain text without html tags, comment etc. The first one i made, its ok. But doing plain text extraction like in example the result was not true. In Robot_registerHTMLParser() i add my HText_registerTextCallback(RHText_addText); and in function RHText_addText i just write out the buffer to plain text file. How can i make this plain text fiel without html tags in right way? Mhm and i saw that webbot dont check robot.txt files right, it parses but dont exclude links, etc? Is it true? Thnx! Jurgjis
Received on Thursday, 20 June 2002 10:39:49 UTC