- From: olivier Thereaux <ot@w3.org>
- Date: Thu, 18 Aug 2005 14:32:22 +0900
- To: Andrei Stanescu <andre@siteuri.ro>
- Cc: w3c-translators@w3.org
On 16 Aug 2005, at 05:17, Andrei Stanescu wrote: > Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request > > ...has ridiculously high request rates, about 10 / page / day. It > only visits my W3C translations, and has done so for months. > > Anyone has any idea what this is and whether it is used by W3C? > Otherwise I will ban it. Definitely not a W3C robot. The only thing that qualifies as such is the link checker, and it has a different user agent signature. That said, 10 requests per page per day isn't incredibly high if his doc is linked from W3C, I think. You wouldn't believe how some robots behave... Looking around for a few minutes, I could read that this was the UA signature for spam harvesters, or that it was just a specific proxy- cache software refreshing its cache. Nothing certain. In any case, making said robot send fewer requests is hardly an option unless you know who is using it (the best way to figure out who is behind the robot would be to look at the IP from which the requests come). But there are robots.txt directives to refuse access to a robot with a specific signature, e.g: User-agent: Fetch API Request Disallow: /my/area and if the robot is impolite and does not follow the robots exclusion protocol, then there's an arsenal of mod_rewrite and "deny from" possibilities (or equivalent if not apache server). Hope this helps. -- olivier
Received on Thursday, 18 August 2005 05:32:25 UTC