Keeping crawlers up-to-date

Hello!

I know this issue has been raised during the LOD BOF at WWW 2009, but
I don't know if any possible solutions emerged from there.

The problem we are facing is that data on BBC Programmes changes
approximately 50 000 times a day (new/updated
broadcasts/versions/programmes/segments etc.). As we'd like to keep a
set of RDF crawlers up-to-date with our information we were wondering
how best to ping these. pingthesemanticweb seems like a nice option,
but it needs the crawlers to ping it often enough to make sure they
didn't miss a change. Another solution we were thinking of would be to
stick either Talis changesets [1] or SPARQL/Update statements in a
message queue, which would then be consumed by the crawlers.

Did anyone tried to tackle this problem already?

Cheers!
y


[1] http://n2.talis.com/wiki/Changeset

Received on Tuesday, 28 April 2009 13:40:28 UTC