W3C home > Mailing lists > Public > public-lod@w3.org > June 2011

Re: Think before you write Semantic Web crawlers

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 22 Jun 2011 11:39:59 +0100
Message-ID: <4E01C67F.9010906@openlinksw.com>
To: public-lod@w3.org
On 6/22/11 10:37 AM, Yves Raimond wrote:
> Request throttling would work, but you would have to find a way to
> identify crawlers, which is tricky: most of them use multiple IPs and
> don't set appropriate user agents (the crawlers that currently hit us
> the most are wget and Java 1.6 :/ ).
Hence the requirement for incorporation of WebID as basis for QoS for 
identifier agents. Everyone else gets to be constrained with rate limits 

Anyway, Identification is the key, the InterWeb jungle needs WebID to 
help reduce costs of serving up Linked Data etc..

Amazing its taken us until 2011 to revisit this critical matter.



Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Wednesday, 22 June 2011 10:40:43 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:29:54 UTC