- From: Martin Hepp (UniBW) <martin.hepp@ebusiness-unibw.org>
- Date: Fri, 11 Jun 2010 17:09:46 +0200
- To: Andreas Harth <andreas@harth.org>
- CC: semantic-web at W3C <semantic-web@w3c.org>
Hi Andreas, On 09.06.10 10:18, Andreas Harth wrote: > Hi Martin, > > first of all, congrats for publishing an apparently popular dataset! > > Thanks, but we were just initially helping with lifting the data. It's hosted on a private machine. > On Tue, Jun 08, 2010 at 10:04:14AM +0200, Martin Hepp (UniBW) wrote: > >> The crawling has been so intense that he had to temporarily block all >> traffic to this dataset. >> > Was this before or after you've fixed the redirect issue? > After we fixed the issue. > In general I agree with you that the crawlers should be bug-free > and well-behaved. Unfortunately that's not always the case. > >> 3. implement some bandwidth throttling technique that limits the >> bandwidth consumption on a single host to a moderate amount. >> > If you want to make sure that only a certain number of requests get > serviced you could configure throttling on your server. See e.g. [1]. > The main problem is that the * relatively small * semantic web community should be very "site-friendly" in general and in particular to limit crawling load. Of course, there are many techniques for protecting a site against ill-behaved crawlers. However, many of those techniques require a lot of skills and expertise that average site-owners don't have. It would be very bad if "Joe, the siteowner" adds RDFa / RDF/XML to his site and the first effect of joining the semantic web effort is that academic crawlers kill the server by massive crawling. Best Martin > Best regards, > Andreas. > > [1] http://code.google.com/p/ldspider/wiki/ServerConfig > > -- -------------------------------------------------------------- martin hepp e-business& web science research group universitaet der bundeswehr muenchen e-mail: hepp@ebusiness-unibw.org phone: +49-(0)89-6004-4217 fax: +49-(0)89-6004-4620 www: http://www.unibw.de/ebusiness/ (group) http://www.heppnetz.de/ (personal) skype: mfhepp twitter: mfhepp Check out GoodRelations for E-Commerce on the Web of Linked Data! ================================================================= Project page: http://purl.org/goodrelations/ Resources for developers: http://www.ebusiness-unibw.org/wiki/GoodRelations Webcasts: Overview - http://www.heppnetz.de/projects/goodrelations/webcast/ How-to - http://vimeo.com/7583816 Recipe for Yahoo SearchMonkey: http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey Talk at the Semantic Technology Conference 2009: "Semantic Web-based E-Commerce: The GoodRelations Ontology" http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287 Overview article on Semantic Universe: http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html Tutorial materials: ISWC 2009 Tutorial: The Web of Data for E-Commerce in Brief: A Hands-on Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009
Received on Friday, 11 June 2010 15:33:50 UTC