W3C home > Mailing lists > Public > public-lod@w3.org > June 2011

Re: Think before you write Semantic Web crawlers

From: Antoine Zimmermann <antoine.zimmermann@gmail.com>
Date: Thu, 23 Jun 2011 09:01:56 +0200
Message-ID: <4E02E4E4.8090707@insa-lyon.fr>
To: antoine.zimmermann@insa-lyon.fr
CC: Antoine Zimmermann <antoine.zimmermann@gmail.com>, Richard Cyganiak <richard@cyganiak.de>, Martin Hepp <martin.hepp@ebusiness-unibw.org>, public-lod@w3.org
Just one more comment: such a list could be useful if it's published by 
a well identified person or group who can be contacted in case of 
disagreement or to get off the list.

Le 23/06/2011 08:27, Antoine Zimmermann a écrit :
> Le 22/06/2011 23:49, Richard Cyganiak a écrit :
>> On 21 Jun 2011, at 10:44, Martin Hepp wrote:
>>> PS: I will not release the IP ranges from which the trouble
>>> originated, but rest assured, there were top research institutions
>>> among them.
>> The right answer is: name and shame. That is the way to teach them.
>> Like Karl said, we should collect information about abusive crawlers
>> so that site operators can defend themselves. It won't be *that* hard
>> to research and collect the IP ranges of offending universities.
>> I started a list here: http://www.w3.org/wiki/Bad_Crawlers
> What's the use of this list?
> Assume it stays empty, as you hope. What's the use?
> Assume it gets filled with names: so what? It does not prove these
> crawlers are bad. The authors of the crawlers can just remove themselves
> from the list. If a crawler is on the list, chances are that nobody
> would notice anyway, especially not the kind of people that Martin is
> defending in his email. If a crawler is put to the list because it is
> bad and measures are taken, what happens when the crawler get fixed and
> become polite? And what if measures are taken while the crawler was not
> bad at all to start with?
> Surely, this list is utterly useless.
> Maybe you can keep the page to describe what are the problems that bad
> crawlers create and what are the measures that publishers can take to
> overcome problematic situation.
> AZ
>> The list is currently empty. I hope it stays that way.
>> Thank you all, Richard
Received on Thursday, 23 June 2011 07:02:28 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:29:54 UTC