W3C home > Mailing lists > Public > public-lod@w3.org > April 2009

Re: Keeping crawlers up-to-date

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 28 Apr 2009 10:39:59 -0400
Message-ID: <49F7153F.8090609@openlinksw.com>
To: Melvin Carvalho <melvincarvalho@gmail.com>
CC: Yves Raimond <yves.raimond@gmail.com>, Linking Open Data <public-lod@w3.org>, Nicholas J Humfrey <njh@aelius.com>, Patrick Sinclair <metade@gmail.com>
Melvin Carvalho wrote:
> On Tue, Apr 28, 2009 at 3:39 PM, Yves Raimond <yves.raimond@gmail.com> wrote:
>   
>> Hello!
>>
>> I know this issue has been raised during the LOD BOF at WWW 2009, but
>> I don't know if any possible solutions emerged from there.
>>
>> The problem we are facing is that data on BBC Programmes changes
>> approximately 50 000 times a day (new/updated
>> broadcasts/versions/programmes/segments etc.). As we'd like to keep a
>> set of RDF crawlers up-to-date with our information we were wondering
>> how best to ping these. pingthesemanticweb seems like a nice option,
>> but it needs the crawlers to ping it often enough to make sure they
>> didn't miss a change. Another solution we were thinking of would be to
>> stick either Talis changesets [1] or SPARQL/Update statements in a
>> message queue, which would then be consumed by the crawlers.
>>     
>
> That's a lot of data, I wonder if there is a smart way of filtering it down.
>
> Perhaps an RDF version of "twitter" would be interesting, where you
> "follow" changes that you're interested in?  You could even follow by
> possibly user, or by SPARQL query, and maybe accross multiple domains.
>   
How about: http://dev.live.com/feedsync/intro.aspx

Nothing stops RDF info. resources being shuttled about using RSS/Atom :-)

Kingsley
>   
>> Did anyone tried to tackle this problem already?
>>
>> Cheers!
>> y
>>
>>
>> [1] http://n2.talis.com/wiki/Changeset
>>
>>
>>     
>
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Tuesday, 28 April 2009 14:40:39 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:20 UTC