W3C home > Mailing lists > Public > www-validator@w3.org > December 2002

Re: Unwanted robot accesses from your site

From: Nick Kew <nick@webthing.com>
Date: Fri, 27 Dec 2002 21:52:26 +0000 (GMT)
To: Olivier Thereaux <ot@w3.org>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-validator@w3.org
Message-ID: <Pine.LNX.4.21.0212272147060.1950-100000@jarl.webthing.com>

On Wed, 25 Dec 2002, Olivier Thereaux wrote:

> 
> Hi Bjoern.
> 
> On Wednesday, Dec 25, 2002, at 13:48 Asia/Tokyo, Bjoern Hoehrmann wrote:
> > Why doesn't checklink qualify as a robot?
> 
> My own definition of a robot is that it retrieves some data (the 
> documents) or metadata (indexing). I may be wrong.

Checklink spiders.  Checklink puts a server under rapid-fire.
IMO these qualify it as a robot, and the latter is a valid reason for
a webmaster to exclude it.  So it must respect robots.txt.

> In any case I don't think checklink, even in recursive mode, should 
> follow the robots directives (noindex is irrelevant, and nofollow would 
> make it useless...). I'm interested to hear opposite arguments, though.

Nofollow doesn't make it useless: it makes it more useful.  A webmaster
knows what an automaton cannot know about conditions where a link
shouldn't be followed.

Besides, the Meta-thingey is a poor substitute for robots.txt.

-- 
Nick Kew
Received on Friday, 27 December 2002 16:55:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:05 GMT