- From: Yves Raimond <yves.raimond@gmail.com>
- Date: Tue, 28 Apr 2009 17:01:54 +0100
- To: Leigh Dodds <leigh.dodds@talis.com>
- Cc: Linking Open Data <public-lod@w3.org>, Nicholas J Humfrey <njh@aelius.com>, Patrick Sinclair <metade@gmail.com>
Hello! > I think the two main options are either to publish a feed containing > pointers to changes, or using a messaging system to push out notifications. > > Despite the recent discussion around benefits of, say, Jabber or other > mechanisms for pushing out notifications, I think that a more RESTful > approach using RSS or Atom feeds might be nicer. Then we can focus on the > resource design, i.e. what kinds of changes do we need to publish. > > So for example for /programmes it may be sufficient to publish a set of > feeds for new, e.g. brands, episodes, versions, etc. These could be RSS 1.0 > and then include additional RDF data as appropriate. My only concern about this is that you need to limit the number of items in the feed. If you have a sudden burst of activity and the crawler just ping the feed at regular intervals, it may miss some updates. However, even with 1M updates in a day, with a feed capped to 100 items would just need the crawlers to ping the feed about every hour and a half. So that's not too bad. (Just noticed that Soren's proposal includes pagination of feeds, which might solve that problem). So yes, I guess it could be done, using RDF feeds e.g. http://www.bbc.co.uk/programmes/updates/2009/04/28/brands.rdf etc. We'd need to carefully think about the feeds we offer though. Cheers! y > > This has the added advantage that a crawler that only wanted to collect > certain information, e.g. about brands, could monitor just the resource(s) > it was interested in. Similarly with careful resource design, the timing of > updates could also be under the control of the crawler, e.g. new versions in > last 12 hours, 24 hours, 7 days (avoiding a massive firehose of updates). > This could be easily done with URIs and avoids having to build that into the > messaging system. > > Interested to know what you think. > > Cheers, > > L. > > 2009/4/28 Yves Raimond <yves.raimond@gmail.com> >> >> Hello! >> >> I know this issue has been raised during the LOD BOF at WWW 2009, but >> I don't know if any possible solutions emerged from there. >> >> The problem we are facing is that data on BBC Programmes changes >> approximately 50 000 times a day (new/updated >> broadcasts/versions/programmes/segments etc.). As we'd like to keep a >> set of RDF crawlers up-to-date with our information we were wondering >> how best to ping these. pingthesemanticweb seems like a nice option, >> but it needs the crawlers to ping it often enough to make sure they >> didn't miss a change. Another solution we were thinking of would be to >> stick either Talis changesets [1] or SPARQL/Update statements in a >> message queue, which would then be consumed by the crawlers. >> >> Did anyone tried to tackle this problem already? >> >> Cheers! >> y >> >> >> [1] http://n2.talis.com/wiki/Changeset >> >> Please consider the environment before printing this email. >> >> Find out more about Talis at www.talis.com >> >> shared innovationTM >> >> Any views or personal opinions expressed within this email may not be >> those of Talis Information Ltd or its employees. The content of this email >> message and any files that may be attached are confidential, and for the >> usage of the intended recipient only. If you are not the intended recipient, >> then please return this message to the sender and delete it. Any use of this >> e-mail by an unauthorised recipient is prohibited. >> >> Talis Information Ltd is a member of the Talis Group of companies and is >> registered in England No 3638278 with its registered office at Knights >> Court, Solihull Parkway, Birmingham Business Park, B37 7YB. >> >> ______________________________________________________________________ >> This email has been scanned by the MessageLabs Email Security System. >> For more information please visit http://www.messagelabs.com/email >> ______________________________________________________________________ > > > > -- > Leigh Dodds > Programme Manager, Talis Platform > Talis > leigh.dodds@talis.com > http://www.talis.com >
Received on Tuesday, 28 April 2009 16:02:35 UTC