W3C home > Mailing lists > Public > public-lod@w3.org > June 2010

Re: Please stop massive crawling against http://openean.kaufkauf.net/id/

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 08 Jun 2010 10:27:08 -0400
Message-ID: <4C0E533C.2010706@openlinksw.com>
To: Robert Fuller <robert.fuller@deri.org>
CC: public-lod@w3.org
Robert Fuller wrote:
> Kingsley Idehen wrote:
>
>> The LOD Cloud Cache at DERI is a live Virtuoso instance with 15 
>> Billion+ Triples loaded. It covers as much of the LOD Cloud as we've 
>> be able to get our hands on plus 6.4 Billion Triples from the 
>> Data.Gov effort.
>>
>> I'll drop a more detailed note about this instance (via blog post) 
>> once we are done with data loading (there's a massive collection of 
>> eCommerce oriented Products & Services data to be loaded amongst 
>> others).
>
> I wonder is this data load the culprit responsible for the "massive 
> crawling"?
>

I don't understand how it can be. That said, there might be services out 
there crawling the instance (as they do DBpedia) which then leads them 
to the actual original data space (even though all the data is actually 
in the lod.openlinksw.com instance) :-(

We'll double check to see that robots.txt is crystal clear re. crawl paths.


-- 

Regards,

Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 
Received on Tuesday, 8 June 2010 14:28:06 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:27 UTC