- From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
- Date: Thu, 23 Jun 2011 10:36:55 +0200
- To: antoine.zimmermann@insa-lyon.fr
- Cc: Richard Cyganiak <richard@cyganiak.de>, public-lod@w3.org
There already exist respective blacklists and services, e.g. http://www.bot-trap.de/home/ It is pretty easy to set up honey pots (e.g. a directory "/bottrap"), link to there from your main-page but disallow crawling in there via robots.txt. You can the quickly collect and share IPs or IP ranges or agent tokens of clients accessing /bottrap content. On Jun 23, 2011, at 8:27 AM, Antoine Zimmermann wrote: > Le 22/06/2011 23:49, Richard Cyganiak a écrit : >> On 21 Jun 2011, at 10:44, Martin Hepp wrote: >>> PS: I will not release the IP ranges from which the trouble >>> originated, but rest assured, there were top research institutions >>> among them. >> >> The right answer is: name and shame. That is the way to teach them. >> >> Like Karl said, we should collect information about abusive crawlers >> so that site operators can defend themselves. It won't be *that* hard >> to research and collect the IP ranges of offending universities. >> >> I started a list here: http://www.w3.org/wiki/Bad_Crawlers > > What's the use of this list? > Assume it stays empty, as you hope. What's the use? > Assume it gets filled with names: so what? It does not prove these > crawlers are bad. The authors of the crawlers can just remove themselves > from the list. If a crawler is on the list, chances are that nobody > would notice anyway, especially not the kind of people that Martin is > defending in his email. If a crawler is put to the list because it is > bad and measures are taken, what happens when the crawler get fixed and > become polite? And what if measures are taken while the crawler was not bad at all to start with? > Surely, this list is utterly useless. > > Maybe you can keep the page to describe what are the problems that bad > crawlers create and what are the measures that publishers can take to > overcome problematic situation. > > > AZ > > >> >> The list is currently empty. I hope it stays that way. >> >> Thank you all, Richard >
Received on Thursday, 23 June 2011 08:37:28 UTC