- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Tue, 28 Apr 2009 19:28:09 +0100
- To: Yves Raimond <yves.raimond@gmail.com>
- Cc: Leigh Dodds <leigh.dodds@talis.com>, Linking Open Data <public-lod@w3.org>, Nicholas J Humfrey <njh@aelius.com>, Patrick Sinclair <metade@gmail.com>
Possibly relevant: http://www.ietf.org/rfc/rfc5005.txt Feed paging and archiving for Atom feeds. Paging is a nice solution to the "small window" problem with syndication feeds. The concept might be translatable to RSS 1.0. Although I have to say that I find the idea of pushing RDF updates via Atom quite appealing. Richard On 28 Apr 2009, at 17:01, Yves Raimond wrote: > Hello! > >> I think the two main options are either to publish a feed containing >> pointers to changes, or using a messaging system to push out >> notifications. >> >> Despite the recent discussion around benefits of, say, Jabber or >> other >> mechanisms for pushing out notifications, I think that a more RESTful >> approach using RSS or Atom feeds might be nicer. Then we can focus >> on the >> resource design, i.e. what kinds of changes do we need to publish. >> >> So for example for /programmes it may be sufficient to publish a >> set of >> feeds for new, e.g. brands, episodes, versions, etc. These could be >> RSS 1.0 >> and then include additional RDF data as appropriate. > > My only concern about this is that you need to limit the number of > items in the feed. If you have a sudden burst of activity and the > crawler just ping the feed at regular intervals, it may miss some > updates. However, even with 1M updates in a day, with a feed capped to > 100 items would just need the crawlers to ping the feed about every > hour and a half. So that's not too bad. > (Just noticed that Soren's proposal includes pagination of feeds, > which might solve that problem). > > So yes, I guess it could be done, using RDF feeds e.g. > http://www.bbc.co.uk/programmes/updates/2009/04/28/brands.rdf etc. > We'd need to carefully think about the feeds we offer though. > > Cheers! > y > >> >> This has the added advantage that a crawler that only wanted to >> collect >> certain information, e.g. about brands, could monitor just the >> resource(s) >> it was interested in. Similarly with careful resource design, the >> timing of >> updates could also be under the control of the crawler, e.g. new >> versions in >> last 12 hours, 24 hours, 7 days (avoiding a massive firehose of >> updates). >> This could be easily done with URIs and avoids having to build that >> into the >> messaging system. >> >> Interested to know what you think. >> >> Cheers, >> >> L. >> >> 2009/4/28 Yves Raimond <yves.raimond@gmail.com> >>> >>> Hello! >>> >>> I know this issue has been raised during the LOD BOF at WWW 2009, >>> but >>> I don't know if any possible solutions emerged from there. >>> >>> The problem we are facing is that data on BBC Programmes changes >>> approximately 50 000 times a day (new/updated >>> broadcasts/versions/programmes/segments etc.). As we'd like to >>> keep a >>> set of RDF crawlers up-to-date with our information we were >>> wondering >>> how best to ping these. pingthesemanticweb seems like a nice option, >>> but it needs the crawlers to ping it often enough to make sure they >>> didn't miss a change. Another solution we were thinking of would >>> be to >>> stick either Talis changesets [1] or SPARQL/Update statements in a >>> message queue, which would then be consumed by the crawlers. >>> >>> Did anyone tried to tackle this problem already? >>> >>> Cheers! >>> y >>> >>> >>> [1] http://n2.talis.com/wiki/Changeset >>> >>> Please consider the environment before printing this email. >>> >>> Find out more about Talis at www.talis.com >>> >>> shared innovationTM >>> >>> Any views or personal opinions expressed within this email may not >>> be >>> those of Talis Information Ltd or its employees. The content of >>> this email >>> message and any files that may be attached are confidential, and >>> for the >>> usage of the intended recipient only. If you are not the intended >>> recipient, >>> then please return this message to the sender and delete it. Any >>> use of this >>> e-mail by an unauthorised recipient is prohibited. >>> >>> Talis Information Ltd is a member of the Talis Group of companies >>> and is >>> registered in England No 3638278 with its registered office at >>> Knights >>> Court, Solihull Parkway, Birmingham Business Park, B37 7YB. >>> >>> ______________________________________________________________________ >>> This email has been scanned by the MessageLabs Email Security >>> System. >>> For more information please visit http://www.messagelabs.com/email >>> ______________________________________________________________________ >> >> >> >> -- >> Leigh Dodds >> Programme Manager, Talis Platform >> Talis >> leigh.dodds@talis.com >> http://www.talis.com >> >
Received on Tuesday, 28 April 2009 18:28:54 UTC