- From: Martin Hepp (UniBW) <martin.hepp@ebusiness-unibw.org>
- Date: Fri, 11 Jun 2010 17:09:46 +0200
- To: Andreas Harth <andreas@harth.org>
- CC: semantic-web at W3C <semantic-web@w3c.org>
Hi Andreas,
On 09.06.10 10:18, Andreas Harth wrote:
> Hi Martin,
>
> first of all, congrats for publishing an apparently popular dataset!
>
>
Thanks, but we were just initially helping with lifting the data. It's
hosted on a private machine.
> On Tue, Jun 08, 2010 at 10:04:14AM +0200, Martin Hepp (UniBW) wrote:
>
>> The crawling has been so intense that he had to temporarily block all
>> traffic to this dataset.
>>
> Was this before or after you've fixed the redirect issue?
>
After we fixed the issue.
> In general I agree with you that the crawlers should be bug-free
> and well-behaved. Unfortunately that's not always the case.
>
>> 3. implement some bandwidth throttling technique that limits the
>> bandwidth consumption on a single host to a moderate amount.
>>
> If you want to make sure that only a certain number of requests get
> serviced you could configure throttling on your server. See e.g. [1].
>
The main problem is that the * relatively small * semantic web community
should be very "site-friendly"
in general and in particular to limit crawling load.
Of course, there are many techniques for protecting a site against
ill-behaved crawlers. However, many of those techniques require a lot of
skills and expertise that average site-owners don't have.
It would be very bad if "Joe, the siteowner" adds RDFa / RDF/XML to his
site and the first effect of joining the semantic web effort is that
academic crawlers kill the server by massive crawling.
Best
Martin
> Best regards,
> Andreas.
>
> [1] http://code.google.com/p/ldspider/wiki/ServerConfig
>
>
--
--------------------------------------------------------------
martin hepp
e-business& web science research group
universitaet der bundeswehr muenchen
e-mail: hepp@ebusiness-unibw.org
phone: +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
http://www.heppnetz.de/ (personal)
skype: mfhepp
twitter: mfhepp
Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
Project page:
http://purl.org/goodrelations/
Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations
Webcasts:
Overview - http://www.heppnetz.de/projects/goodrelations/webcast/
How-to - http://vimeo.com/7583816
Recipe for Yahoo SearchMonkey:
http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey
Talk at the Semantic Technology Conference 2009:
"Semantic Web-based E-Commerce: The GoodRelations Ontology"
http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287
Overview article on Semantic Universe:
http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html
Tutorial materials:
ISWC 2009 Tutorial: The Web of Data for E-Commerce in Brief: A Hands-on Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009
Received on Friday, 11 June 2010 15:33:50 UTC