[dcat] Requirements and deliverables

Dear dcat group,

As a starting point for discussion, here's an attempt at sketching the  
basic requirements that dcat should fulfill, and a proposal for the  
initial list of deliverables. It would be helpful if you read this  
before the dcat teleconference! Comments welcome, both here and in the  
call!


Terminology: A "catalog" is a collection of "entries". Each entry  
consists of metadata for a "dataset". The actual dataset is not part  
of the catalog itself, but usually downloadable somewhere off-site.


Requirements:

1. Must allow retrieval of a machine-readable representation of all  
entries in the catalog (in order to run queries over them, visualize  
the data catalog in a new way, import it elsewhere, or do other bulk  
operations)

2. Must provide stable, persistent identifiers for individual entries  
that can be resolved to a machine-readable representation of the entry

3. Must allow checking wether an individual dataset has changed or was  
updated (in order to re-load the data into an application or mashup)

4. Must allow keeping one catalog in sync with another (in order to  
federate several data catalogs into one)

5. Must allow creation of the machine-readable entry representations  
from existing data catalogs without requiring the production of new  
metadata or the modification of existing metadata (in other words, you  
can implement it for an existing data catalog without cleaning up or  
otherwise modifying the metadata that your catalog collects)

6. Must cover the metadata that is found in typical government data  
catalogs, while being extensible with additional, catalog-specific  
metadata fields.


First round of deliverables:

1. dcat Use Cases and Requirements: A short and sweet document that  
explains the things we have in mind with dcat.

2. dcat RDF Vocabulary and Reference: An RDFS vocabulary (re-using  
existing terms where possible), including guidelines for its usage  
(which properties are required/optional, what kind of values can a  
property take, should it be from a controlled vocabulary etc)


And that could be all for the first round. As a second round, we  
should then do one or more documents that explain how to use this  
abstract vocabulary in concrete formats/syntaxes/protocols. Perhaps  
Atom and RDFa would be two good candidates for concrete formats. I  
would defer this to a second round because the choice of concrete  
format shouldn't happen before we have documented the requirements.

Best,
Richard

Received on Thursday, 29 April 2010 01:15:18 UTC