- From: Ville Skytta <ville@dev.w3.org>
- Date: Wed, 09 Jun 2004 06:30:07 +0000
- To: www-validator-cvs@w3.org
Update of /sources/public/perl/modules/W3C/LinkChecker/docs In directory hutz:/tmp/cvs-serv21590/docs Modified Files: checklink.html Log Message: Add blurb about robots exclusion implementation details. Index: checklink.html =================================================================== RCS file: /sources/public/perl/modules/W3C/LinkChecker/docs/checklink.html,v retrieving revision 1.20 retrieving revision 1.21 diff -u -d -r1.20 -r1.21 --- checklink.html 8 Jun 2004 17:15:02 -0000 1.20 +++ checklink.html 9 Jun 2004 06:30:04 -0000 1.21 @@ -214,6 +214,19 @@ </pre> <p> + Robots exlusion support in the link checker is based on the + <a href="http://search.cpan.org/dist/libwww-perl/lib/LWP/RobotUA.pm">LWP::RobotUA</a> + Perl module. It currently supports the + "<a href="http://www.robotstxt.org/wc/norobots.html">original 1994 version</a>" + of the standard. The robots META tag, ie. + <code><meta name="robots" content="..."></code>, is not supported. + Other than that, the link checker's implementation goes all the way + in trying to honor robots exclusion rules; if a + <code>/robots.txt</code> disallows it, not even the first document + submitted as the root for a link checker run is fetched. + </p> + + <p> Note that <code>/robots.txt</code> rules affect only user agents that honor it; it is not a generic method for access control. </p>
Received on Wednesday, 9 June 2004 02:30:07 UTC