W3C home > Mailing lists > Public > www-lib@w3.org > April to June 2002

robot

From: Jurgis <jurgis@lursoft.lv>
Date: Thu, 20 Jun 2002 17:39:15 +0300
Message-ID: <012101c21868$40434110$9f9df4c3@lursoft.lv>
To: <www-lib@w3.org>

Hi!
I'm trying to modify exisiting webbot, i change something in link
extraction, i want that this robot be as only downloader and parser. just
downloads URL parses it and make 2 files one with URLs another plain text
without html tags, comment etc. The first one i made, its ok. But doing
plain text extraction like in example the result was not true. In
Robot_registerHTMLParser() i add my
HText_registerTextCallback(RHText_addText); and in function RHText_addText i
just write out the buffer to plain text file. How can i make this plain text
fiel without html tags in right way?
Mhm and i saw that webbot dont check robot.txt files right, it parses but
dont exclude links, etc? Is it true?

Thnx!
Jurgjis
Received on Thursday, 20 June 2002 10:39:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:42 GMT