W3C home > Mailing lists > Public > semantic-web@w3.org > June 2010

Re: Please stop massive crawling against http://openean.kaufkauf.net/id/

From: Andreas Harth <andreas@harth.org>
Date: Wed, 9 Jun 2010 10:18:48 +0200
To: "Martin Hepp (UniBW)" <martin.hepp@ebusiness-unibw.org>
Cc: semantic-web at W3C <semantic-web@w3c.org>
Message-ID: <20100609081848.GA31997@harth.org>
Hi Martin,

first of all, congrats for publishing an apparently popular dataset!

On Tue, Jun 08, 2010 at 10:04:14AM +0200, Martin Hepp (UniBW) wrote:
> The crawling has been so intense that he had to temporarily block all  
> traffic to this dataset.

Was this before or after you've fixed the redirect issue?

In general I agree with you that the crawlers should be bug-free
and well-behaved.  Unfortunately that's not always the case.

> 3. implement some bandwidth throttling technique that limits the  
> bandwidth consumption on a single host to a moderate amount.

If you want to make sure that only a certain number of requests get
serviced you could configure throttling on your server.  See e.g. [1].

Best regards,
Andreas.

[1] http://code.google.com/p/ldspider/wiki/ServerConfig
Received on Wednesday, 9 June 2010 08:22:26 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:36 GMT