Re: Think before you write Semantic Web crawlers

On 6/22/11 10:37 AM, Yves Raimond wrote:
> Request throttling would work, but you would have to find a way to
> identify crawlers, which is tricky: most of them use multiple IPs and
> don't set appropriate user agents (the crawlers that currently hit us
> the most are wget and Java 1.6 :/ ).
Hence the requirement for incorporation of WebID as basis for QoS for 
identifier agents. Everyone else gets to be constrained with rate limits 
etc..

Anyway, Identification is the key, the InterWeb jungle needs WebID to 
help reduce costs of serving up Linked Data etc..

Amazing its taken us until 2011 to revisit this critical matter.

-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Wednesday, 22 June 2011 10:40:43 UTC