W3C home > Mailing lists > Public > semantic-web@w3.org > June 2010

Please stop massive crawling against http://openean.kaufkauf.net/id/

From: Martin Hepp (UniBW) <martin.hepp@ebusiness-unibw.org>
Date: Tue, 08 Jun 2010 10:04:14 +0200
Message-ID: <4C0DF97E.1040105@ebusiness-unibw.org>
To: semantic-web at W3C <semantic-web@w3c.org>
Dear all:

The volunteer who is hosting http://openean.kaufkauf.net/id/, a huge set 
of GoodRelations product model data, is experiencing a problematic 
amount of traffic from unidentified crawlers located in Ireland (DERI?), 
the Netherlands (VUA?), and the USA.

The crawling has been so intense that he had to temporarily block all 
traffic to this dataset.

In case you are operating any kind of Semantic Web crawlers that tried 
to access this dataset, please

1. check your crawler for bugs that create excessive traffic (e.g. by 
redundant requests),
2. identify your crawler agent properly in the HTTP header, indicating a 
contact person, and
3. implement some bandwidth throttling technique that limits the 
bandwidth consumption on a single host to a moderate amount.

Note that the full dataset is always up to date in the LOD SPARQL 
endpoint at


Thus, there is rarely a need to crawl the complete dataset.

Thanks for your consideration.

Best wishes

Martin Hepp


martin hepp
e-business&  web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
          http://www.heppnetz.de/ (personal)
skype:   mfhepp
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!

Project page:

Resources for developers:

Overview - http://www.heppnetz.de/projects/goodrelations/webcast/
How-to   - http://vimeo.com/7583816

Recipe for Yahoo SearchMonkey:

Talk at the Semantic Technology Conference 2009:
"Semantic Web-based E-Commerce: The GoodRelations Ontology"

Overview article on Semantic Universe:

Tutorial materials:
ISWC 2009 Tutorial: The Web of Data for E-Commerce in Brief: A Hands-on 
Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
Received on Wednesday, 9 June 2010 00:27:34 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:18 UTC