W3C home > Mailing lists > Public > public-lod@w3.org > June 2010

Re: Please stop massive crawling against http://openean.kaufkauf.net/id/

From: Dan Brickley <danbri@danbri.org>
Date: Tue, 8 Jun 2010 10:42:22 +0200
Message-ID: <AANLkTikr0EXkK6bRVIVsTNXc81CxVD8EmhgPWjBZVXNA@mail.gmail.com>
To: martin.hepp@ebusiness-unibw.org
Cc: "public-lod@w3.org" <public-lod@w3.org>
On Tue, Jun 8, 2010 at 10:03 AM, Martin Hepp (UniBW)
<martin.hepp@ebusiness-unibw.org> wrote:
> Dear all:
>
> The volunteer who is hosting http://openean.kaufkauf.net/id/, a huge set of
> GoodRelations product model data, is experiencing a problematic amount of
> traffic from unidentified crawlers located in Ireland (DERI?), the
> Netherlands (VUA?), and the USA.
>
> The crawling has been so intense that he had to temporarily block all
> traffic to this dataset.

Any reason not to block the troublemakers by IP address?

> In case you are operating any kind of Semantic Web crawlers that tried to
> access this dataset, please
>
> 1. check your crawler for bugs that create excessive traffic (e.g. by
> redundant requests),
> 2. identify your crawler agent properly in the HTTP header, indicating a
> contact person, and
> 3. implement some bandwidth throttling technique that limits the bandwidth
> consumption on a single host to a moderate amount.

Yes, de-referencing is a privilege not a right!

Also folk should respect robots.txt -
http://en.wikipedia.org/wiki/Robots_exclusion_standard

cheers,

Dan
Received on Tuesday, 8 June 2010 08:51:04 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:27 UTC