W3C home > Mailing lists > Public > public-lod@w3.org > June 2010

Re: Please stop massive crawling against http://openean.kaufkauf.net/id/

From: Christophe Guéret <cgueret@few.vu.nl>
Date: Tue, 08 Jun 2010 11:01:14 +0200
Message-ID: <4C0E06DA.4060607@few.vu.nl>
To: martin.hepp@ebusiness-unibw.org, Linked Data community <public-lod@w3.org>
Dear Martin,

I guess the VUA crawler was our. The deficient process has been stopped 
now and won't be restarted before being checked for bugs.
Sorry about all the problems caused.

Best regards,

On 06/08/2010 10:03 AM, Martin Hepp (UniBW) wrote:
>  Dear all:
>  The volunteer who is hosting http://openean.kaufkauf.net/id/, a huge
>  set of GoodRelations product model data, is experiencing a problematic
>  amount of traffic from unidentified crawlers located in Ireland
>  (DERI?), the Netherlands (VUA?), and the USA.
>  The crawling has been so intense that he had to temporarily block all
>  traffic to this dataset.
>  In case you are operating any kind of Semantic Web crawlers that tried
>  to access this dataset, please
>  1. check your crawler for bugs that create excessive traffic (e.g. by
>  redundant requests),
>  2. identify your crawler agent properly in the HTTP header, indicating
>  a contact person, and
>  3. implement some bandwidth throttling technique that limits the
>  bandwidth consumption on a single host to a moderate amount.
>  Note that the full dataset is always up to date in the LOD SPARQL
>  endpoint at
>  http://lod.openlinksw.com/sparql
>  Thus, there is rarely a need to crawl the complete dataset.
>  Thanks for your consideration.
>  Best wishes
>  Martin Hepp

Dr. Christophe Guéret (cgueret@few.vu.nl)
Postdoc working on SOKS (http://www.few.vu.nl/soks)
Knowledge Representation&   Reasoning Group
Computational Intelligence Group
Department of Computer Science, AI
VU University Amsterdam

Received on Tuesday, 8 June 2010 09:30:28 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:21:01 UTC