Re: Think before you write Semantic Web crawlers

On 6/21/11 10:54 AM, Henry Story wrote:
>
> Then you could just redirect him straight to the n3 dump of graphs of 
> your site (I say graphs because your site not necessarily being 
> consistent, the crawler may be interested in keeping information about 
> which pages said what)
> Redirect may be a bit harsh. So you could at first link him to the dump

Only trouble with the above, is that many don't produce graph dumps 
anymore, they just have SPARQL endpoints, then you pound the endpoints 
and hit timeouts etc..

A looong time ago, very early LOD days, we (LOD community) talked about 
the importance of dumps with the heuristic you describe in mind (no 
WebID then, but it was clear something would emerge). Unfortunately, 
SPARQL endpoints have become the first point of call re. Linked Data 
even though SPARQL endpoint only == asking for trouble if you can self 
protect the endpoint and re-route agents to dumps.

Maybe we can use WebID and recent troubles as basis for reestablishing 
this most vital of best practices re. Linked Data publication. Of 
course, this is also awesome dog-fooding too!

-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Tuesday, 21 June 2011 10:24:03 UTC