Re: RDF Update Feeds from Yves Raimond on 2009-11-20 (public-lod@w3.org from November 2009)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Fri, 20 Nov 2009 14:46:20 +0000
To: Niklas Lindström <lindstream@gmail.com>
Cc: nathan@webr3.org, Georgi Kobilarov <georgi.kobilarov@gmx.de>, public-lod@w3.org
Message-ID: <82593ac00911200646r4ba39797m5246f630cb4eac5@mail.gmail.com>
Hello!

Back in April, we had a similar discussion:

http://lists.w3.org/Archives/Public/public-lod/2009Apr/0130.html

Concretely, we are having exactly the same problem for syncing up
aggregations of BBC RDF data (Talis's and OpenLink's), as our data
changes *a lot*.

Right now, we're thinking about a really simple feed, detailing a) if
a change event is a delete, an update or a create and b) what thing
has changed. That's a start, but should be enough to sync up with our
data.

Cheers,
y

2009/11/18 Niklas Lindström <lindstream@gmail.com>:
> Hi Nathan!
>
> 2009/11/17 Nathan <nathan@webr3.org>:
>> very short non-detailed reply from me!
>
> I appreciate it.
>
>> pub/sub, atom feeds, RDF over XMPP were my initial thoughts on the
>> matter last week - essentially triple (update/publish) streams on a
>> pub/sub basis, decentralized suitably, [snip]
>>
>> then my thoughts switched to the fact that RDF is not XML (or any other
>> serialized format) so to keep it non limited I guess the concept would
>> need to be specified first then implemented in whatever formats/ways
>> people saw fit, as has been the case with RDF.
>
> I agree that the concept should really be format-independent. But I
> think it has to be pragmatic and operation-oriented, to avoid "never
> getting there".
>
> Atom (feed paging and archiving) is basically designed with exactly
> this in mind, and it scaled to my use-cases (resources with multiple
> representations, plus opt. "attachments"), while still being simple
> enough to work for "just RDF updates". The missing piece is the
> deleted-entry/tombstone, for which there is thankfully at least an
> I-D.
>
> Therefore modelling the approach around these possibilities required a
> minimum of invention (none really, just some wording to descibe the
> practise), and it seems suited for a wide range of dataset syndication
> scenarios (not so much real-time, where XMPP may be relevant).
>
> At least this works very well as long as the datasets can be sensibly
> partitioned into documents (contexts/"graphs"). But this is IMHO is
> the best way to manage RDF anyhow (not the least since one can also
> leverage simple REST principles for editing; and since
> quad-stores/SPARQL-endpoints support named contexts etc).
>
> But I'd gladly discuss the benefit/drawback ratio of this approach in
> relation to our and others' scenarios.
>
> (I do think it would be nice to "lift" the resulting timeline to
> proper RDF -- e.g. AtomOwl (plus a Deletion for tombstones, provenance
> and logging etc). But these rather complex concepts -- datasources
> (dataset vs. collection vs. feed vs. page), timelines (entries are
> *events* for the same resource over time), "flat resource manifest"
> concepts, and so on -- require semantic definitions which will
> probably continue to be debated for quite some time! Atom can be
> leveraged right now. After all, this is a *very* instrumental aspect
> for most domains.)
>
>
>> this subject is probably not something that should be left for long
>> though.. my (personal) biggest worry about 'linked data' is that junk
>> data will be at an all time high, if not worse, and not nailing this on
>> the head early on (as in weeks/months at max) could contribute to the
>> mess considerably.
>
> Couldn't agree with you more. A common, direct (and "simple enough")
> way of syndicating datasets over time would be very beneficial, and
> shared practises for that seems to be lacking today.
>
> COURT <http://purl.org/net/court> is publically much of a strawman
> right now, but I would like to flesh it out. Primarily regarding the
> use of Atom I've described, but also with details of our
> implementation (the swedish legal information system), concerning
> collection and storage, proposed validation and URI-minting/verifying
> strategies, "lifting" the timeline for logging etc.
>
> (In what form and where the project's actual source code will be
> public remains to be decided (though opensourcing it has always been
> the official plan). Time permitting I will push my own work in the
> same vein there for reuse and reference. Regardless I trust the
> approach to be simple enough to be implementable from reading this
> mail-thread alone. ;) )
>
> Best regards,
> Niklas Lindström
>
>
Received on Friday, 20 November 2009 14:46:55 UTC