W3C home > Mailing lists > Public > public-lod@w3.org > April 2009

Re: Keeping crawlers up-to-date

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 28 Apr 2009 09:55:08 -0400
Message-ID: <49F70ABC.4000401@openlinksw.com>
To: Yves Raimond <yves.raimond@gmail.com>
CC: Linking Open Data <public-lod@w3.org>, Nicholas J Humfrey <njh@aelius.com>, Patrick Sinclair <metade@gmail.com>
Yves Raimond wrote:
> Hello!
> I know this issue has been raised during the LOD BOF at WWW 2009, but
> I don't know if any possible solutions emerged from there.
> The problem we are facing is that data on BBC Programmes changes
> approximately 50 000 times a day (new/updated
> broadcasts/versions/programmes/segments etc.). As we'd like to keep a
> set of RDF crawlers up-to-date with our information we were wondering
> how best to ping these. pingthesemanticweb seems like a nice option,
> but it needs the crawlers to ping it often enough to make sure they
> didn't miss a change. 

What's wrong with that ? :-)

If PTSW works then consumers should just ping it based on their solution 
change sensitivity thresholds.

> Another solution we were thinking of would be to
> stick either Talis changesets [1] or SPARQL/Update statements in a
> message queue, which would then be consumed by the crawlers.

An addition option if for the HTML information resources to be crawled 
as per usual with RDF aware crawlers using RDF discovery patterns to 
locate RDF information resource represenations via <link/> .


> Did anyone tried to tackle this problem already?
> Cheers!
> y
> [1] http://n2.talis.com/wiki/Changeset



Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Tuesday, 28 April 2009 13:55:47 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:46 UTC