W3C home > Mailing lists > Public > public-lod@w3.org > June 2011

Re: Think before you write Semantic Web crawlers

From: Antoine Zimmermann <antoine.zimmermann@gmail.com>
Date: Thu, 23 Jun 2011 08:27:42 +0200
Message-ID: <4E02DCDE.7080301@insa-lyon.fr>
To: Richard Cyganiak <richard@cyganiak.de>
CC: Martin Hepp <martin.hepp@ebusiness-unibw.org>, public-lod@w3.org
Le 22/06/2011 23:49, Richard Cyganiak a écrit :
> On 21 Jun 2011, at 10:44, Martin Hepp wrote:
>> PS: I will not release the IP ranges from which the trouble
>> originated, but rest assured, there were top research institutions
>> among them.
>
> The right answer is: name and shame. That is the way to teach them.
>
> Like Karl said, we should collect information about abusive crawlers
> so that site operators can defend themselves. It won't be *that* hard
> to research and collect the IP ranges of offending universities.
>
> I started a list here: http://www.w3.org/wiki/Bad_Crawlers

What's the use of this list?
Assume it stays empty, as you hope. What's the use?
Assume it gets filled with names: so what? It does not prove these
crawlers are bad. The authors of the crawlers can just remove themselves
from the list. If a crawler is on the list, chances are that nobody
would notice anyway, especially not the kind of people that Martin is
defending in his email. If a crawler is put to the list because it is
bad and measures are taken, what happens when the crawler get fixed and
become polite? And what if measures are taken while the crawler was not 
bad at all to start with?
Surely, this list is utterly useless.

Maybe you can keep the page to describe what are the problems that bad
crawlers create and what are the measures that publishers can take to
overcome problematic situation.


AZ


>
> The list is currently empty. I hope it stays that way.
>
> Thank you all, Richard
Received on Thursday, 23 June 2011 06:28:14 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:29:54 UTC