W3C home > Mailing lists > Public > semantic-web@w3.org > June 2010

Please stop massive crawling against http://openean.kaufkauf.net/id/

From: Martin Hepp (UniBW) <martin.hepp@ebusiness-unibw.org>
Date: Tue, 08 Jun 2010 10:04:14 +0200
Message-ID: <4C0DF97E.1040105@ebusiness-unibw.org>
To: semantic-web at W3C <semantic-web@w3c.org>
Dear all:

The volunteer who is hosting http://openean.kaufkauf.net/id/, a huge set 
of GoodRelations product model data, is experiencing a problematic 
amount of traffic from unidentified crawlers located in Ireland (DERI?), 
the Netherlands (VUA?), and the USA.

The crawling has been so intense that he had to temporarily block all 
traffic to this dataset.

In case you are operating any kind of Semantic Web crawlers that tried 
to access this dataset, please

1. check your crawler for bugs that create excessive traffic (e.g. by 
redundant requests),
2. identify your crawler agent properly in the HTTP header, indicating a 
contact person, and
3. implement some bandwidth throttling technique that limits the 
bandwidth consumption on a single host to a moderate amount.

Note that the full dataset is always up to date in the LOD SPARQL 
endpoint at

http://lod.openlinksw.com/sparql

Thus, there is rarely a need to crawl the complete dataset.

Thanks for your consideration.

Best wishes

Martin Hepp

-- 

-- 
--------------------------------------------------------------
martin hepp
e-business&  web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
          http://www.heppnetz.de/ (personal)
skype:   mfhepp
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================

Project page:
http://purl.org/goodrelations/

Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations

Webcasts:
Overview - http://www.heppnetz.de/projects/goodrelations/webcast/
How-to   - http://vimeo.com/7583816

Recipe for Yahoo SearchMonkey:
http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey

Talk at the Semantic Technology Conference 2009:
"Semantic Web-based E-Commerce: The GoodRelations Ontology"
http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287

Overview article on Semantic Universe:
http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html

Tutorial materials:
ISWC 2009 Tutorial: The Web of Data for E-Commerce in Brief: A Hands-on 
Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009
Received on Wednesday, 9 June 2010 00:27:34 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:36 GMT