robot from Jurgis on 2002-06-20 (www-lib@w3.org from April to June 2002)

From: Jurgis <jurgis@lursoft.lv>
Date: Thu, 20 Jun 2002 17:39:15 +0300
To: <www-lib@w3.org>
Message-ID: <012101c21868$40434110$9f9df4c3@lursoft.lv>

Hi!
I'm trying to modify exisiting webbot, i change something in link
extraction, i want that this robot be as only downloader and parser. just
downloads URL parses it and make 2 files one with URLs another plain text
without html tags, comment etc. The first one i made, its ok. But doing
plain text extraction like in example the result was not true. In
Robot_registerHTMLParser() i add my
HText_registerTextCallback(RHText_addText); and in function RHText_addText i
just write out the buffer to plain text file. How can i make this plain text
fiel without html tags in right way?
Mhm and i saw that webbot dont check robot.txt files right, it parses but
dont exclude links, etc? Is it true?

Thnx!
Jurgjis

Received on Thursday, 20 June 2002 10:39:49 UTC