Re: Keeping crawlers up-to-date from Melvin Carvalho on 2009-04-28 (public-lod@w3.org from April 2009)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Tue, 28 Apr 2009 16:32:35 +0200
To: Yves Raimond <yves.raimond@gmail.com>
Cc: Linking Open Data <public-lod@w3.org>, Nicholas J Humfrey <njh@aelius.com>, Patrick Sinclair <metade@gmail.com>
Message-ID: <9178f78c0904280732g16ee19a2t770727a3c95adac6@mail.gmail.com>

On Tue, Apr 28, 2009 at 3:39 PM, Yves Raimond <yves.raimond@gmail.com> wrote:
> Hello!
>
> I know this issue has been raised during the LOD BOF at WWW 2009, but
> I don't know if any possible solutions emerged from there.
>
> The problem we are facing is that data on BBC Programmes changes
> approximately 50 000 times a day (new/updated
> broadcasts/versions/programmes/segments etc.). As we'd like to keep a
> set of RDF crawlers up-to-date with our information we were wondering
> how best to ping these. pingthesemanticweb seems like a nice option,
> but it needs the crawlers to ping it often enough to make sure they
> didn't miss a change. Another solution we were thinking of would be to
> stick either Talis changesets [1] or SPARQL/Update statements in a
> message queue, which would then be consumed by the crawlers.

That's a lot of data, I wonder if there is a smart way of filtering it down.

Perhaps an RDF version of "twitter" would be interesting, where you
"follow" changes that you're interested in?  You could even follow by
possibly user, or by SPARQL query, and maybe accross multiple domains.

>
> Did anyone tried to tackle this problem already?
>
> Cheers!
> y
>
>
> [1] http://n2.talis.com/wiki/Changeset
>
>

Received on Tuesday, 28 April 2009 14:33:10 UTC