Re: Keeping crawlers up-to-date

Yves Raimond wrote:
> Hello!
>
> I know this issue has been raised during the LOD BOF at WWW 2009, but
> I don't know if any possible solutions emerged from there.
>
> The problem we are facing is that data on BBC Programmes changes
> approximately 50 000 times a day (new/updated
> broadcasts/versions/programmes/segments etc.). As we'd like to keep a
> set of RDF crawlers up-to-date with our information we were wondering
> how best to ping these. pingthesemanticweb seems like a nice option,
> but it needs the crawlers to ping it often enough to make sure they
> didn't miss a change. 

What's wrong with that ? :-)

If PTSW works then consumers should just ping it based on their solution 
change sensitivity thresholds.

> Another solution we were thinking of would be to
> stick either Talis changesets [1] or SPARQL/Update statements in a
> message queue, which would then be consumed by the crawlers.
>   

An addition option if for the HTML information resources to be crawled 
as per usual with RDF aware crawlers using RDF discovery patterns to 
locate RDF information resource represenations via <link/> .


Kingsley

> Did anyone tried to tackle this problem already?
>
> Cheers!
> y
>
>
> [1] http://n2.talis.com/wiki/Changeset
>
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com

Received on Tuesday, 28 April 2009 13:55:47 UTC