W3C home > Mailing lists > Public > public-lod@w3.org > April 2009

Re: Keeping crawlers up-to-date

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 28 Apr 2009 10:58:57 -0400
Message-ID: <49F719B1.3080902@openlinksw.com>
To: Peter Coetzee <peter@coetzee.org>
CC: Melvin Carvalho <melvincarvalho@gmail.com>, Yves Raimond <yves.raimond@gmail.com>, Linking Open Data <public-lod@w3.org>, Nicholas J Humfrey <njh@aelius.com>, Patrick Sinclair <metade@gmail.com>
Peter Coetzee wrote:
>
>
> On Tue, Apr 28, 2009 at 3:39 PM, Kingsley Idehen 
> <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote:
>
>     Melvin Carvalho wrote:
>
>         On Tue, Apr 28, 2009 at 3:39 PM, Yves Raimond
>         <yves.raimond@gmail.com <mailto:yves.raimond@gmail.com>> wrote:
>          
>
>             Hello!
>
>             I know this issue has been raised during the LOD BOF at
>             WWW 2009, but
>             I don't know if any possible solutions emerged from there.
>
>             The problem we are facing is that data on BBC Programmes
>             changes
>             approximately 50 000 times a day (new/updated
>             broadcasts/versions/programmes/segments etc.). As we'd
>             like to keep a
>             set of RDF crawlers up-to-date with our information we
>             were wondering
>             how best to ping these. pingthesemanticweb seems like a
>             nice option,
>             but it needs the crawlers to ping it often enough to make
>             sure they
>             didn't miss a change. Another solution we were thinking of
>             would be to
>             stick either Talis changesets [1] or SPARQL/Update
>             statements in a
>             message queue, which would then be consumed by the crawlers.
>                
>
>
>         That's a lot of data, I wonder if there is a smart way of
>         filtering it down.
>
>         Perhaps an RDF version of "twitter" would be interesting,
>         where you
>         "follow" changes that you're interested in?  You could even
>         follow by
>         possibly user, or by SPARQL query, and maybe accross multiple
>         domains.
>          
>
>     How about: http://dev.live.com/feedsync/intro.aspx
>
>     Nothing stops RDF info. resources being shuttled about using
>     RSS/Atom :-)
>
>     Kingsley
>
>
> Alternatively, why not take an approach similar to the Wikipedia live 
> feeds, and push them out on public chat channels; perhaps 
> SPARQL/Update messages on a read-only Jabber/IRC etc stream? 
> Interested parties are free to consume them, and use the queries to 
> keep their local copy up-to-date with each set of changes. Possibly 
> preferable to reinventing the wheel with some kind of stream server :)
Peter,

Cool idea :-)

Kingsley
>
> Peter
>
>  
>
>
>          
>
>             Did anyone tried to tackle this problem already?
>
>             Cheers!
>             y
>
>
>             [1] http://n2.talis.com/wiki/Changeset
>
>
>                
>
>
>
>          
>
>
>
>     -- 
>
>
>     Regards,
>
>     Kingsley Idehen       Weblog:
>     http://www.openlinksw.com/blog/~kidehen
>     <http://www.openlinksw.com/blog/%7Ekidehen>
>     President & CEO OpenLink Software     Web: http://www.openlinksw.com
>
>
>
>
>
>


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Tuesday, 28 April 2009 14:59:38 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:20 UTC