- From: Ville Skyttä <ville.skytta@iki.fi>
- Date: 25 Dec 2002 12:40:21 +0200
- To: Olivier Thereaux <ot@w3.org>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-validator@w3.org
On Wed, 2002-12-25 at 09:05, Olivier Thereaux wrote: > On Wednesday, Dec 25, 2002, at 13:48 Asia/Tokyo, Bjoern Hoehrmann wrote: > > Why doesn't checklink qualify as a robot? > > My own definition of a robot is that it retrieves some data (the > documents) or metadata (indexing). I may be wrong. Checklink definitely retrieves documents. It doesn't store any information or present the document to its user though. I don't think that storing, indexing, etc. is a criteria whether something is a robot or not. I don't know if there's an authoritative definition anywhere, but The Web Robots FAQ [1] has one. It also mentions "HTML validation" and "Link validation" as purposes robots can be used for. The database [2] on that site doesn't contain the W3C Validator or Link Checker, though. But there are other link checkers in the list ("Link Validator", "LinkScan", "LinkWalker" ...). > In any case I don't think checklink, even in recursive mode, should > follow the robots directives (noindex is irrelevant, and nofollow would > make it useless...). I'm interested to hear opposite arguments, though. I tend to see checklink as a robot, and that it should be fully following the robots exclusion standard in all modes. Without an authoritative specification, this is tough to back up (which is already evident in this thread). See the related stuff also in [3] and [4], and the Bugzilla enhancement entry at [5]. [1] <http://www.robotstxt.org/wc/faq.html> [2] <http://www.robotstxt.org/wc/active/html/> [3] <http://www.kollar.com/robots.html> [4] <http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt> [5] <http://www.w3.org/Bugs/Public/show_bug.cgi?id=27> -- \/ille Skyttä ville.skytta at iki.fi
Received on Wednesday, 25 December 2002 05:40:03 UTC