W3C home > Mailing lists > Public > public-lod@w3.org > June 2010

Re: Please stop massive crawling against http://openean.kaufkauf.net/id/

From: Story Henry <henry.story@bblfish.net>
Date: Tue, 8 Jun 2010 10:17:51 +0200
Cc: "public-lod@w3.org" <public-lod@w3.org>
Message-Id: <F4FFA6A2-5B63-4464-84DF-F3BE3665A3C9@bblfish.net>
To: martin.hepp@ebusiness-unibw.org
One could put the data behind foaf+ssl, and so identify agents :-)

Henry

On 8 Jun 2010, at 10:03, Martin Hepp (UniBW) wrote:

> Dear all:
> 
> The volunteer who is hosting http://openean.kaufkauf.net/id/, a huge set of GoodRelations product model data, is experiencing a problematic amount of traffic from unidentified crawlers located in Ireland (DERI?), the Netherlands (VUA?), and the USA.
> 
> The crawling has been so intense that he had to temporarily block all traffic to this dataset.
> 
> In case you are operating any kind of Semantic Web crawlers that tried to access this dataset, please
> 
> 1. check your crawler for bugs that create excessive traffic (e.g. by redundant requests),
> 2. identify your crawler agent properly in the HTTP header, indicating a contact person, and
> 3. implement some bandwidth throttling technique that limits the bandwidth consumption on a single host to a moderate amount.
> 
> Note that the full dataset is always up to date in the LOD SPARQL endpoint at
> 
> http://lod.openlinksw.com/sparql
> 
> Thus, there is rarely a need to crawl the complete dataset.
> 
> Thanks for your consideration.
> 
> Best wishes
> 
> Martin Hepp
> 
> -- 
> 
> -- 
> --------------------------------------------------------------
> martin hepp
> e-business&  web science research group
> universitaet der bundeswehr muenchen
> 
> e-mail:  hepp@ebusiness-unibw.org
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>         http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
> 
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
> 
> Project page:
> http://purl.org/goodrelations/
> 
> Resources for developers:
> http://www.ebusiness-unibw.org/wiki/GoodRelations
> 
> Webcasts:
> Overview - http://www.heppnetz.de/projects/goodrelations/webcast/
> How-to   - http://vimeo.com/7583816
> 
> Recipe for Yahoo SearchMonkey:
> http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey
> 
> Talk at the Semantic Technology Conference 2009:
> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
> http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287
> 
> Overview article on Semantic Universe:
> http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html
> 
> Tutorial materials:
> ISWC 2009 Tutorial: The Web of Data for E-Commerce in Brief: A Hands-on Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
> http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009
> 
> 
Received on Tuesday, 8 June 2010 08:18:50 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:27 UTC