- From: Dan Brickley <danbri@danbri.org>
- Date: Tue, 8 Jun 2010 10:42:22 +0200
- To: martin.hepp@ebusiness-unibw.org
- Cc: "public-lod@w3.org" <public-lod@w3.org>
On Tue, Jun 8, 2010 at 10:03 AM, Martin Hepp (UniBW) <martin.hepp@ebusiness-unibw.org> wrote: > Dear all: > > The volunteer who is hosting http://openean.kaufkauf.net/id/, a huge set of > GoodRelations product model data, is experiencing a problematic amount of > traffic from unidentified crawlers located in Ireland (DERI?), the > Netherlands (VUA?), and the USA. > > The crawling has been so intense that he had to temporarily block all > traffic to this dataset. Any reason not to block the troublemakers by IP address? > In case you are operating any kind of Semantic Web crawlers that tried to > access this dataset, please > > 1. check your crawler for bugs that create excessive traffic (e.g. by > redundant requests), > 2. identify your crawler agent properly in the HTTP header, indicating a > contact person, and > 3. implement some bandwidth throttling technique that limits the bandwidth > consumption on a single host to a moderate amount. Yes, de-referencing is a privilege not a right! Also folk should respect robots.txt - http://en.wikipedia.org/wiki/Robots_exclusion_standard cheers, Dan
Received on Tuesday, 8 June 2010 08:51:04 UTC