W3C home > Mailing lists > Public > public-egov-ig@w3.org > February 2010

Re: RDF for data catalogues (was: Re: Universal distributed open government data catalog?)

From: Martín Álvarez Espinar <martin.alvarez@fundacionctic.org>
Date: Mon, 22 Feb 2010 16:34:27 +0100
Message-ID: <4B82A403.3040602@fundacionctic.org>
To: William Waites <ww@styx.org>
CC: "rufus.pollock Pollock" <rufus.pollock@okfn.org>, Peter Krantz <peter.krantz@gmail.com>, Ed Summers <ehs@pobox.com>, Jeni Tennison <jeni@jenitennison.com>, eGov IG <public-egov-ig@w3.org>, Jonathan Gray <jonathan.gray@okfn.org>, ckan-discuss Discuss <ckan-discuss@lists.okfn.org>
Hello William,

Thanks for your comment. I think you are right, and dct:Dataset would be 
more accurate to represent any kind of catalogue, even if they are not 
Linking Data compliant.

This vocabulary is public, but now it's only used in our own Sandbox. We 
should refine it and use it in a more standard way, and register it in 
purl.org.

Best regards,

Martin

William Waites escribió:
> Holá Martín!
>
> This is great work. A couple of questions. Should cat:Dataset really be a
> subclass of void:Dataset? The void spec says,
>
>     A dataset is a collection of data, published and maintained by a
>     single provider, available as RDF on the Web, where at least some
>     of the resources in the dataset are identified by dereferencable
>     URIs.
>
> Often the datasets referred to in the catalog are not available as RDF
> (though of course people are working to change that...). With CKAN,
> for example, we find datasets in .xls files, NetCDF files and funny
> XML without any dereferencable URIs. The situation is similar with
> data.gov.uk (which uses the same software as CKAN for its catalogue).
> Perhaps it should just be a subclass of dct:Dataset (of which the
> void one is already a subclass).
>
> So,
>
> { ?s a void:Dataset } => { ?s a dct:Dataset }
> { ?s a cat:Dataset } => { ?s a dct:Dataset }
> { ?s a cat:Dataset } not => { ?s a void:Dataset}
>
> data.gov.uk needs this or something like it... yesterday. How far off
> becoming official is this vocabulary. Have steps been taken to register
> it under purl.org, for example?
>
> I rather think it would make a lot of sense for data.gov and data.gov.uk
> to use the same vocabularies here.
>
> Cheers,
> -w
>
> Le 10-02-22 à 10:56, Martín Álvarez Espinar a écrit :
>
>> Hello,
>>
>> We had though of representing the list of all available public 
>> catalogues in RDF. Having in mind voiD and SCOVO, we tried to 
>> represent Catalogues which are composed of Datasets. These catalogues 
>> are described using FOAF and Dublin Core metadata and are available 
>> from our SPARQL endpoint [1].
>>
>> This is an example of proposed catalog represented using this 
>> vocabulary [2]:
>>
>> :data.gov a cat:Catalog ;
>>   dcterms:identifier "data.gov" ;
>>   foaf:homepage <http://www.data.gov/catalog> ;
>>   rdfs:label "US Federal Government Catalog" ;
>>   dcterms:description "The purpose of Data.gov is...." ;
>>   dcterms:language "en" ;
>>   dcterms:issued "2009-05-21"^^xsd:date ;
>>   dcterms:license <http://www.data.gov/datapolicy> ;
>>   dcterms:spatial <http://sws.geonames.org/6252001/> .
>>
>> Every catalogue is enriched with Linked Data. In this example, the 
>> "dcterms:spatial" property links to the area which are covered by 
>> data.gov (USA or http://sws.geonames.org/6252001/).
>>
>> Using this information from Geonames, we can build some 
>> representations of these catalogues like simple listings [3] or maps 
>> [4].
>>
>> Best regards,
>>
>> Martin
>>
>> [1] http://data.fundacionctic.org/sparql
>> [2] http://data.fundacionctic.org/vocab/catalog/datasets.html.en
>> [3] http://datos.fundacionctic.org/sandbox/catalog/index.html.en
>> [4] http://datos.fundacionctic.org/sandbox/catalog/map
>>
>>
>> Rufus Pollock escribió:
>>> On 17 February 2010 22:56, Peter Krantz <peter.krantz@gmail.com> wrote:
>>>
>>>> On Tue, Feb 2, 2010 at 18:59, Ed Summers <ehs@pobox.com> wrote:
>>>>
>>>>> My personal opinion is that a key ingredient to making this happen is
>>>>> to publish dataset availability and metadata using a syndicated feed
>>>>> (Atom and/or RSS).
>>>>>
>>>> I have implemented the RDF metadata on opengov.se now. All data is in
>>>> swedish but you get the idea if you look at an individual dataset:
>>>>
>>>> http://www.opengov.se/data/42/
>>>>
>>>> ...and its RDF representation (based on dublin core terms):
>>>>
>>>> http://www.opengov.se/data/42/rdf/
>>>>
>>>
>>> Great stuff Peter. For comparison, here's an example of what you get
>>> from ckan.net + semantic.ckan.net:
>>>
>>> http://pastie.org/830693
>>>
>>> At the moment we redirect into semantic.ckan.net from ckan.net via a
>>> rel=alternative and 303 on the Accept header, e.g try out:
>>>
>>> curl -L -H "Accept: application/rdf+xml"
>>> http://ckan.net/package/2000-us-census-rdf
>>>
>>> semantic.ckan.net also provides a human readable version of the data:
>>>
>>> <http://semantic.ckan.net/data/2000-us-census-rdf>
>>>
>>> We've thought quite a bit about integrating directly into ckan.net
>>> (hence the /data/ rather than /package/ on semantic.ckan.net) but the
>>> issue here is that we want to use a proper triple store for the data
>>> so you can query via sparql (currently
>>> http://semantic.ckan.net/sparql). Thus we've gone for the separate but
>>> related model for the present.
>>>
>>> Maybe it would be worth getting together for half-an-hour on skype and
>>> etherpad to work on hammering out a shared ontology here? I also know
>>> the people from DERI (Richard Cyganiak especially) are working on this
>>> so we should talk with them.
>>>
>>>
>>>> I have also made sure an Atom feed contains all datasets (with a link
>>>> element to the RDF representations in each entry element) here:
>>>>
>>>> http://www.opengov.se/feeds/data/
>>>>
>>>
>>> Great. Like you we should add RDF link to our atom feed (which as I've
>>> already mentioned can be found at
>>> http://www.ckan.net/revision/list?format=atom&days=30)
>>>
>>>
>>>> Please note that the feed contains datasets that are not (yet) open.
>>>> Some may have a commercial license and may not be available on the
>>>> web.
>>>>
>>>
>>> That's also true for us ;)
>>>
>>> Regards,
>>>
>>> Rufus
>>>
>> -- 
>>
>> Martín Álvarez Espinar
>> CTIC-Centro Tecnológico
>>
>> Parque Científico y Tecnológico de Gijón
>> c/ Ada Byron, 39 Edificio Centros Tecnológicos
>> 33203 Gijón - Asturias - España
>> Tel.: +34 984 29 12 12
>> Fax: +34 984 39 06 12
>> E-mail: martin.alvarez@fundacionctic.org
>> http://www.fundacionctic.org
>> Política de Privacidad: http://www.fundacionctic.org/privacidad
>
> -- 
> William Waites VE2WSW                <ww@styx.org>
> http://www.irl.styx.org/          +44 131 516 3563
> CD70 0498 8AE4 36EA 1CD7  281C 427A 3F36 2130 E9F5
>
>
>

-- 
Martín Álvarez Espinar
CTIC-Centro Tecnológico

Parque Científico y Tecnológico de Gijón
c/ Ada Byron, 39 Edificio Centros Tecnológicos
33203 Gijón - Asturias - España
Tel.: +34 984 29 12 12
Fax: +34 984 39 06 12
E-mail: martin.alvarez@fundacionctic.org
http://www.fundacionctic.org
Política de Privacidad: http://www.fundacionctic.org/privacidad
Received on Monday, 22 February 2010 15:34:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 22 February 2010 15:34:47 GMT