W3C home > Mailing lists > Public > www-validator@w3.org > March 2000

search robots validating web pages

From: Hans Ulrich Niedermann <ulrich@niedermann.bb.bawue.de>
Date: Wed, 8 Mar 2000 13:29:15 -0500 (EST)
To: www-validator@w3.org
Message-ID: <m2u2ihco4k.fsf@chef.niedermann.bb.bawue.de>
Hi validators,

I just recognised strange coincidences in the logs of my web
server: In times with lots of search robot hits, also validator.w3.org
wanted to validate my pages.

All the pages the web robots were visiting contain a link like
(you could call this kind of a "validating link"). So I suspect the
search robots visited that URL (repeatedly!). This behaviour is not
contrary to the robots.txt file on validator.w3.org: 

# robots.txt for validator.w3.org
# $Id: robots.txt,v 1.2 1998/07/24 22:11:35 gerald Exp $

# User-Agent: *
# Disallow:

I think using 

User-Agent: *
Disallow: /check

would have the following advantages:

1. for validator.w3.org: less system load
2. for sites with "validating links": less system load, more accurate
   access counters
3. for the robots: they won't index pages nobody wants to find using
   a search engine

I can't think of any disadvantages for any party.

Critical annotations and replys are welcome.


Received on Wednesday, 8 March 2000 16:37:23 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:26 UTC