W3C home > Mailing lists > Public > www-validator@w3.org > December 2002

Re: Unwanted robot accesses from your site

From: Tim Bagot <tsb-w3-validator-0007@earth.li>
Date: Sun, 22 Dec 2002 10:55:39 +0000 (UTC)
To: <www-validator@w3.org>
Cc: miim webmaster <xxdpplus@yahoo.com>
Message-ID: <Pine.LNX.4.33.0212221009350.924-100000@213-152-52-166.dsl.eclipse.net.uk>

At 2002-12-21T21:39-0800, miim webmaster wrote:-

> We don't want your robot to visit our site.
>
> We don't use your service, and we don't particularly
> like the idea of other people using your robot to
> scan our site.

Why?

> There is a standard for robot exclusion.  Your robot
> doesn't follow it.  It should.

The Robots Exclusion Protocol probably should be obeyed for recursive
validation - if/when that's implemented - and link checking. Apart from
being the Right Thing To Do, it would allow authors to use these tools
without them straying into unhelpful parts of the URI space (e.g. huge
forests of dynamically generated pages). (And I notice this is already on
the to-do list for checklink.) But in the case of a single validation
request initiated by a user, I don't think it's really acting as a robot.

>                                 All we can do at this
> time is use Apache rules to shovel garbage into it
> when it visits.

What's wrong with 403 Forbidden?


Tim Bagot
[In no way speaking for the W3C or the Validation Team]
Received on Sunday, 22 December 2002 07:02:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:05 GMT