Re: Please stop massive crawling against

Robert Fuller wrote:
> Hi,
> Sindice clearly identifies itself in the user agent http header. 
> Currently we use these user agents:
> 1. "Mozilla/5.0 (compatible; sindice-fetcher/0.1.0 
> +"
> 2. "SindiceFetcher/Ping Manager ("
> 3. " ontology fetcher"
> Niceness is implemented in our main fetcher. In some cases there may 
> be bursts on sites providing distributed ontologies. Speaking with the 
> group here it seems unlikely that we have not been hitting 
>,  however if you can provide an IP address I can do some 
> further verification.
> I understand that is now hosted at 
> DERI, and I wonder could some of the traffic be related to that? 
> Again, if you can provide an IP address I will do some further 
> verification.


As indicated by Martin, the <> instance hosted 
at DERI should negate the need to go back to the original source.


The LOD Cloud Cache at DERI is a live Virtuoso instance with 15 Billion+ 
Triples loaded. It covers as much of the LOD Cloud as we've be able to 
get our hands on plus 6.4 Billion Triples from the Data.Gov effort.

I'll drop a more detailed note about this instance (via blog post) once 
we are done with data loading (there's a massive collection of eCommerce 
oriented Products & Services data to be loaded amongst others).

> Kind regards,
> Rob.
> -- 
> Robert Fuller
> Research Associate
> DERI, Galway



Kingsley Idehen	      
President & CEO 
OpenLink Software     
Twitter/ kidehen 

Received on Tuesday, 8 June 2010 13:39:21 UTC