W3C home > Mailing lists > Public > www-validator@w3.org > December 2002

Re: Unwanted robot accesses from your site

From: Olivier Thereaux <ot@w3.org>
Date: Wed, 25 Dec 2002 12:12:33 +0900
Cc: www-validator@w3.org
To: miim webmaster <xxdpplus@yahoo.com>
Message-Id: <B5D2E9EB-17B6-11D7-B932-000393BAB03A@w3.org>

On Sunday, Dec 22, 2002, at 14:39 Asia/Tokyo, miim webmaster wrote:
> We don't want your robot to visit our site.

Which "robot" are you talking about? This mailing-list covers several 
services, including "markup validator" and "checklink". Neither of 
those is a robot...

If you can specify which of those you call "robot", it may help us 
answer your request.

> We don't use your service, and we don't particularly
> like the idea of other people using your robot to
> scan our site.

Again, we need to know what you are talking about. Assuming you are 
talking about the markup validation service, I'm surprised you call it 
a "robot". The validator has nothing to do with a robot, it is, rather, 
a user agent, as are all the web browsers. The validator does not 
"scan" sites, either. It requests a document (as would any browser) and 
instead of displaying it (as would any browser) it parses it and checks 
validity of the markup.

If people try to validate the pages on your site, why should this be a 

> There is a standard for robot exclusion.

Well, there is a protocol for robots exclusion, but I doubt that user 
agents other than robots are supposed to follow it.

> All we can do at this
> time is use Apache rules to shovel garbage into it
> when it visits.

Since you seem to advocate the use of standards, remember that there is 
a standard (HTTP) technique (the 403 Forbidden response code) to forbid 
access to your resources. Note, however, that agent blocking is 
considered harmful, but that would be at least a tad better than 
"shovel[ing] garbage into it"...

Olivier Thereaux - W3C
Received on Tuesday, 24 December 2002 22:12:37 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:58:31 UTC