Re: RDF for data catalogues (was: Re: Universal distributed open government data catalog?)

Holá Martín!

This is great work. A couple of questions. Should cat:Dataset really  
be a
subclass of void:Dataset? The void spec says,

	A dataset is a collection of data, published and maintained by a
	single provider, available as RDF on the Web, where at least some
	of the resources in the dataset are identified by dereferencable
	URIs.

Often the datasets referred to in the catalog are not available as RDF
(though of course people are working to change that...). With CKAN,
for example, we find datasets in .xls files, NetCDF files and funny
XML without any dereferencable URIs. The situation is similar with
data.gov.uk (which uses the same software as CKAN for its catalogue).
Perhaps it should just be a subclass of dct:Dataset (of which the
void one is already a subclass).

So,

{ ?s a void:Dataset } => { ?s a dct:Dataset }
{ ?s a cat:Dataset } => { ?s a dct:Dataset }
{ ?s a cat:Dataset } not => { ?s a void:Dataset}

data.gov.uk needs this or something like it... yesterday. How far off
becoming official is this vocabulary. Have steps been taken to register
it under purl.org, for example?

I rather think it would make a lot of sense for data.gov and data.gov.uk
to use the same vocabularies here.

Cheers,
-w

Le 10-02-22 à 10:56, Martín Álvarez Espinar a écrit :

> Hello,
>
> We had though of representing the list of all available public  
> catalogues in RDF. Having in mind voiD and SCOVO, we tried to  
> represent Catalogues which are composed of Datasets. These  
> catalogues are described using FOAF and Dublin Core metadata and are  
> available from our SPARQL endpoint [1].
>
> This is an example of proposed catalog represented using this  
> vocabulary [2]:
>
> :data.gov a cat:Catalog ;
>   dcterms:identifier "data.gov" ;
>   foaf:homepage <http://www.data.gov/catalog> ;
>   rdfs:label "US Federal Government Catalog" ;
>   dcterms:description "The purpose of Data.gov is...." ;
>   dcterms:language "en" ;
>   dcterms:issued "2009-05-21"^^xsd:date ;
>   dcterms:license <http://www.data.gov/datapolicy> ;
>   dcterms:spatial <http://sws.geonames.org/6252001/> .
>
> Every catalogue is enriched with Linked Data. In this example, the  
> "dcterms:spatial" property links to the area which are covered by  
> data.gov (USA or http://sws.geonames.org/6252001/).
>
> Using this information from Geonames, we can build some  
> representations of these catalogues like simple listings [3] or maps  
> [4].
>
> Best regards,
>
> Martin
>
> [1] http://data.fundacionctic.org/sparql
> [2] http://data.fundacionctic.org/vocab/catalog/datasets.html.en
> [3] http://datos.fundacionctic.org/sandbox/catalog/index.html.en
> [4] http://datos.fundacionctic.org/sandbox/catalog/map
>
>
> Rufus Pollock escribió:
>> On 17 February 2010 22:56, Peter Krantz <peter.krantz@gmail.com>  
>> wrote:
>>
>>> On Tue, Feb 2, 2010 at 18:59, Ed Summers <ehs@pobox.com> wrote:
>>>
>>>> My personal opinion is that a key ingredient to making this  
>>>> happen is
>>>> to publish dataset availability and metadata using a syndicated  
>>>> feed
>>>> (Atom and/or RSS).
>>>>
>>> I have implemented the RDF metadata on opengov.se now. All data is  
>>> in
>>> swedish but you get the idea if you look at an individual dataset:
>>>
>>> http://www.opengov.se/data/42/
>>>
>>> ...and its RDF representation (based on dublin core terms):
>>>
>>> http://www.opengov.se/data/42/rdf/
>>>
>>
>> Great stuff Peter. For comparison, here's an example of what you get
>> from ckan.net + semantic.ckan.net:
>>
>> http://pastie.org/830693
>>
>> At the moment we redirect into semantic.ckan.net from ckan.net via a
>> rel=alternative and 303 on the Accept header, e.g try out:
>>
>> curl -L -H "Accept: application/rdf+xml"
>> http://ckan.net/package/2000-us-census-rdf
>>
>> semantic.ckan.net also provides a human readable version of the data:
>>
>> <http://semantic.ckan.net/data/2000-us-census-rdf>
>>
>> We've thought quite a bit about integrating directly into ckan.net
>> (hence the /data/ rather than /package/ on semantic.ckan.net) but the
>> issue here is that we want to use a proper triple store for the data
>> so you can query via sparql (currently
>> http://semantic.ckan.net/sparql). Thus we've gone for the separate  
>> but
>> related model for the present.
>>
>> Maybe it would be worth getting together for half-an-hour on skype  
>> and
>> etherpad to work on hammering out a shared ontology here? I also know
>> the people from DERI (Richard Cyganiak especially) are working on  
>> this
>> so we should talk with them.
>>
>>
>>> I have also made sure an Atom feed contains all datasets (with a  
>>> link
>>> element to the RDF representations in each entry element) here:
>>>
>>> http://www.opengov.se/feeds/data/
>>>
>>
>> Great. Like you we should add RDF link to our atom feed (which as  
>> I've
>> already mentioned can be found at
>> http://www.ckan.net/revision/list?format=atom&days=30)
>>
>>
>>> Please note that the feed contains datasets that are not (yet) open.
>>> Some may have a commercial license and may not be available on the
>>> web.
>>>
>>
>> That's also true for us ;)
>>
>> Regards,
>>
>> Rufus
>>
> -- 
>
> Martín Álvarez Espinar
> CTIC-Centro Tecnológico
>
> Parque Científico y Tecnológico de Gijón
> c/ Ada Byron, 39 Edificio Centros Tecnológicos
> 33203 Gijón - Asturias - España
> Tel.: +34 984 29 12 12
> Fax: +34 984 39 06 12
> E-mail: martin.alvarez@fundacionctic.org
> http://www.fundacionctic.org
> Política de Privacidad: http://www.fundacionctic.org/privacidad

--
William Waites VE2WSW                <ww@styx.org>
http://www.irl.styx.org/          +44 131 516 3563
CD70 0498 8AE4 36EA 1CD7  281C 427A 3F36 2130 E9F5

Received on Monday, 22 February 2010 13:22:00 UTC