Re: Keeping crawlers up-to-date

Melvin Carvalho wrote:
> On Tue, Apr 28, 2009 at 3:39 PM, Yves Raimond <yves.raimond@gmail.com> wrote:
>   
>> Hello!
>>
>> I know this issue has been raised during the LOD BOF at WWW 2009, but
>> I don't know if any possible solutions emerged from there.
>>
>> The problem we are facing is that data on BBC Programmes changes
>> approximately 50 000 times a day (new/updated
>> broadcasts/versions/programmes/segments etc.). As we'd like to keep a
>> set of RDF crawlers up-to-date with our information we were wondering
>> how best to ping these. pingthesemanticweb seems like a nice option,
>> but it needs the crawlers to ping it often enough to make sure they
>> didn't miss a change. Another solution we were thinking of would be to
>> stick either Talis changesets [1] or SPARQL/Update statements in a
>> message queue, which would then be consumed by the crawlers.
>>     
>
> That's a lot of data, I wonder if there is a smart way of filtering it down.
>
> Perhaps an RDF version of "twitter" would be interesting, where you
> "follow" changes that you're interested in?  You could even follow by
> possibly user, or by SPARQL query, and maybe accross multiple domains.
>   
How about: http://dev.live.com/feedsync/intro.aspx

Nothing stops RDF info. resources being shuttled about using RSS/Atom :-)

Kingsley
>   
>> Did anyone tried to tackle this problem already?
>>
>> Cheers!
>> y
>>
>>
>> [1] http://n2.talis.com/wiki/Changeset
>>
>>
>>     
>
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com

Received on Tuesday, 28 April 2009 14:40:39 UTC