Re: RDF Update Feeds from Niklas Lindström on 2009-11-18 (public-lod@w3.org from November 2009)

From: Niklas Lindström <lindstream@gmail.com>
Date: Wed, 18 Nov 2009 18:03:03 +0100
To: nathan@webr3.org
Cc: Georgi Kobilarov <georgi.kobilarov@gmx.de>, public-lod@w3.org
Message-ID: <cf8107640911180903l4091783bo1ea6391705bdc42a@mail.gmail.com>
Hi Nathan!

2009/11/17 Nathan <nathan@webr3.org>:
> very short non-detailed reply from me!

I appreciate it.

> pub/sub, atom feeds, RDF over XMPP were my initial thoughts on the
> matter last week - essentially triple (update/publish) streams on a
> pub/sub basis, decentralized suitably, [snip]
>
> then my thoughts switched to the fact that RDF is not XML (or any other
> serialized format) so to keep it non limited I guess the concept would
> need to be specified first then implemented in whatever formats/ways
> people saw fit, as has been the case with RDF.

I agree that the concept should really be format-independent. But I
think it has to be pragmatic and operation-oriented, to avoid "never
getting there".

Atom (feed paging and archiving) is basically designed with exactly
this in mind, and it scaled to my use-cases (resources with multiple
representations, plus opt. "attachments"), while still being simple
enough to work for "just RDF updates". The missing piece is the
deleted-entry/tombstone, for which there is thankfully at least an
I-D.

Therefore modelling the approach around these possibilities required a
minimum of invention (none really, just some wording to descibe the
practise), and it seems suited for a wide range of dataset syndication
scenarios (not so much real-time, where XMPP may be relevant).

At least this works very well as long as the datasets can be sensibly
partitioned into documents (contexts/"graphs"). But this is IMHO is
the best way to manage RDF anyhow (not the least since one can also
leverage simple REST principles for editing; and since
quad-stores/SPARQL-endpoints support named contexts etc).

But I'd gladly discuss the benefit/drawback ratio of this approach in
relation to our and others' scenarios.

(I do think it would be nice to "lift" the resulting timeline to
proper RDF -- e.g. AtomOwl (plus a Deletion for tombstones, provenance
and logging etc). But these rather complex concepts -- datasources
(dataset vs. collection vs. feed vs. page), timelines (entries are
*events* for the same resource over time), "flat resource manifest"
concepts, and so on -- require semantic definitions which will
probably continue to be debated for quite some time! Atom can be
leveraged right now. After all, this is a *very* instrumental aspect
for most domains.)


> this subject is probably not something that should be left for long
> though.. my (personal) biggest worry about 'linked data' is that junk
> data will be at an all time high, if not worse, and not nailing this on
> the head early on (as in weeks/months at max) could contribute to the
> mess considerably.

Couldn't agree with you more. A common, direct (and "simple enough")
way of syndicating datasets over time would be very beneficial, and
shared practises for that seems to be lacking today.

COURT <http://purl.org/net/court> is publically much of a strawman
right now, but I would like to flesh it out. Primarily regarding the
use of Atom I've described, but also with details of our
implementation (the swedish legal information system), concerning
collection and storage, proposed validation and URI-minting/verifying
strategies, "lifting" the timeline for logging etc.

(In what form and where the project's actual source code will be
public remains to be decided (though opensourcing it has always been
the official plan). Time permitting I will push my own work in the
same vein there for reuse and reference. Regardless I trust the
approach to be simple enough to be implementable from reading this
mail-thread alone. ;) )

Best regards,
Niklas Lindström
Received on Wednesday, 18 November 2009 17:03:57 UTC