Re: RDF Dataset Notifications

Hi Leigh!

On Fri, Apr 16, 2010 at 10:19 PM, Leigh Dodds <leigh.dodds@talis.com> wrote:
> There's been a fair bit of discussion, and more than a few papers around
> dataset notifications recently. I've written up a blog post and a quick
> survey of technologies to start to classify the available approaches:
>
> http://www.ldodds.com/blog/2010/04/rdf-dataset-notifications/

Nice summary! This is a most important topic. I think the Dataset
Dynamics Group [1] is very relevant to this, so I've CC:ed that list.
(Note: posting to that requires membership.)

Also, the W3C eGov group may be of interest to you, where work is
currently under way to take the dcat vocabulary forwards [2]. See e.g.
[3], and my followup [4] which focuses on this aspect of datasets
(outlined in the COURT project [5]).

For the needs I've had so far, Atom seems a very viable way forward
(and pubsubhubbub is a very powerful extension to that method).
However, it would be very beneficial to the community if the different
RDF vocabularies (i.e. AtomOwl [6], those listed at [7], and OPM [8])
could be consolidated somehow. Especially for logging and RDF-based
data store implementation purposes. One thing lacking in these models
seems to be representing deletions (see e.g. [9] for an openvocab
extension to AtomOwl for those).

How/if this can be related to SIOC is another interesting question. I
think care should be taken to differentiate between the domain
described by the content and the (mechanical) way datasets, their
repositories, modifications and syndications are described. To keep
things orthogonal.

My take is basically close to a resource (or even named graph)
oriented approach. I consider (atom) entries as a package of one or
more closely related information resources sharing a common topic (say
a document, person, vocabulary or a data(sub)set). In my work I store
all RDF extractable from such an entry in a timestamped context
(corresponding to the entry itself). I think this is also close to how
many content repositories (e.g. DSpace and Fedora) are (or should be)
modelled.

An entry in this view corresponds to a resource with one or more
representations, also carrying possible attachments such as images or
appendices to the primary document. This should be aligned with
REST-principles and a resource-oriented management of datasets. (And
may be considered related to, but more simplistic than, e.g. OAI-ORE
[10] or even CMIS.)

Of course, as you clearly mention, this is suboptimal for high-volume
updates. The discussion on "DBPedia hosting burden" on this list,
especially the thoughts on using bittorrent [11] or similar is
interesting here. I still wonder whether a resource-oriented model
wouldn't be a good way to represent the underlying repository though
(and then combined with triple-centric updates).

I'm thinking that the dataset dynamics group may be the most
appropriate forum to take this further. What do you think? It would be
great to have it aligned with the dcat work as well (and/or void+dady,
depending on whether these progress together).

Best regards,
Niklas

[1]: <http://groups.google.com/group/dataset-dynamics/>
[2]: <http://vocab.deri.ie/dcat>
[3]: <http://lists.w3.org/Archives/Public/public-egov-ig/2010Apr/0021.html>
[4]: <http://lists.w3.org/Archives/Public/public-egov-ig/2010Apr/0022.html>
[5]: <http://code.google.com/p/court/>
[6]: <http://bblfish.net/work/atom-owl/2006-06-06/AtomOwl.html>
[7]: <http://groups.google.com/group/dataset-dynamics/web/components-vocabularies-protocols-formats>
[8]: <http://openprovenance.org/>
[9]: <http://open.vocab.org/docs/DeletedEntry>
[10]: <http://www.openarchives.org/ore/>
[11]: <http://lists.w3.org/Archives/Public/public-lod/2010Apr/0205.html>

Received on Saturday, 17 April 2010 11:27:47 UTC