- From: Michael[tm] Smith <mike@w3.org>
- Date: Mon, 10 Aug 2015 12:56:31 +0900
- To: Andrew Avdeenko <rasprod@tyt.by>
- Cc: www-validator@w3.org
- Message-ID: <20150810035631.GC963@sideshowbarker.net>
Andrew Avdeenko <rasprod@tyt.by>, 2015-08-02 16:00 +0300: > Archived-At: <http://www.w3.org/mid/op.x2qrq8zm278snb@microsof-c0ae01> > > Is it possible to deny access to my website for W3C validators using > robots.txt? If "yes", what user-agent(s) must be specified? The W3C Link Checker https://validator.w3.org/checklink is the only one that’s actually a crawler/robot, and so the only one that pays attention to robots.txt files. You can block it by specifying “User-Agent: W3C-checklink”. All of the services also have “http://validator.w3.org/services" in their user-agent strings, and run on hosts with IP addresses in the 128.30.52.0/24 subnet. So you can block them based on that user-agent substring, or by IP address—but because none of the services other than the link checker are crawlers/robots in normal usage, they’re not among the types of tools that robots.txt is intended for, so you’ll need to use some other means to block them (e.g., some specific configuration to your firewall or Web server). There used to be a document at http://validator.w3.org/services which explained all this but it seems to have disappeared, so I’ll get it restored as soon as possible. —Mike -- Michael[tm] Smith https://people.w3.org/mike
Received on Monday, 10 August 2015 03:56:56 UTC