- From: Niklas Lindström <lindstream@gmail.com>
- Date: Sat, 17 Apr 2010 13:26:54 +0200
- To: Leigh Dodds <leigh.dodds@talis.com>
- Cc: Linking Open Data <public-lod@w3.org>, Richard Cyganiak <richard@cyganiak.de>, dataset-dynamics@googlegroups.com
Hi Leigh! On Fri, Apr 16, 2010 at 10:19 PM, Leigh Dodds <leigh.dodds@talis.com> wrote: > There's been a fair bit of discussion, and more than a few papers around > dataset notifications recently. I've written up a blog post and a quick > survey of technologies to start to classify the available approaches: > > http://www.ldodds.com/blog/2010/04/rdf-dataset-notifications/ Nice summary! This is a most important topic. I think the Dataset Dynamics Group [1] is very relevant to this, so I've CC:ed that list. (Note: posting to that requires membership.) Also, the W3C eGov group may be of interest to you, where work is currently under way to take the dcat vocabulary forwards [2]. See e.g. [3], and my followup [4] which focuses on this aspect of datasets (outlined in the COURT project [5]). For the needs I've had so far, Atom seems a very viable way forward (and pubsubhubbub is a very powerful extension to that method). However, it would be very beneficial to the community if the different RDF vocabularies (i.e. AtomOwl [6], those listed at [7], and OPM [8]) could be consolidated somehow. Especially for logging and RDF-based data store implementation purposes. One thing lacking in these models seems to be representing deletions (see e.g. [9] for an openvocab extension to AtomOwl for those). How/if this can be related to SIOC is another interesting question. I think care should be taken to differentiate between the domain described by the content and the (mechanical) way datasets, their repositories, modifications and syndications are described. To keep things orthogonal. My take is basically close to a resource (or even named graph) oriented approach. I consider (atom) entries as a package of one or more closely related information resources sharing a common topic (say a document, person, vocabulary or a data(sub)set). In my work I store all RDF extractable from such an entry in a timestamped context (corresponding to the entry itself). I think this is also close to how many content repositories (e.g. DSpace and Fedora) are (or should be) modelled. An entry in this view corresponds to a resource with one or more representations, also carrying possible attachments such as images or appendices to the primary document. This should be aligned with REST-principles and a resource-oriented management of datasets. (And may be considered related to, but more simplistic than, e.g. OAI-ORE [10] or even CMIS.) Of course, as you clearly mention, this is suboptimal for high-volume updates. The discussion on "DBPedia hosting burden" on this list, especially the thoughts on using bittorrent [11] or similar is interesting here. I still wonder whether a resource-oriented model wouldn't be a good way to represent the underlying repository though (and then combined with triple-centric updates). I'm thinking that the dataset dynamics group may be the most appropriate forum to take this further. What do you think? It would be great to have it aligned with the dcat work as well (and/or void+dady, depending on whether these progress together). Best regards, Niklas [1]: <http://groups.google.com/group/dataset-dynamics/> [2]: <http://vocab.deri.ie/dcat> [3]: <http://lists.w3.org/Archives/Public/public-egov-ig/2010Apr/0021.html> [4]: <http://lists.w3.org/Archives/Public/public-egov-ig/2010Apr/0022.html> [5]: <http://code.google.com/p/court/> [6]: <http://bblfish.net/work/atom-owl/2006-06-06/AtomOwl.html> [7]: <http://groups.google.com/group/dataset-dynamics/web/components-vocabularies-protocols-formats> [8]: <http://openprovenance.org/> [9]: <http://open.vocab.org/docs/DeletedEntry> [10]: <http://www.openarchives.org/ore/> [11]: <http://lists.w3.org/Archives/Public/public-lod/2010Apr/0205.html>
Received on Saturday, 17 April 2010 11:27:47 UTC