Re: Classification of open datasets...

Hi Peter,
Thank you for kicking off a thread initially on the e-gov IG and EU Open Data lists.  I've broadened to include the public W3C Government Linked Data working group because we're interested stakeholders. I hope this helps ...

Today, I pinged several of the editors of the DCAT vocabulary that is on track as a W3C Recommendation document.  Immediately, several responses from working group members (both in Europe) independently shared a perspective that is held by many linked data advocates:

1) There is no one ring to rule them all -- there is no one vocabulary to describe gov't data sets globally.  That is a feature, not a bug.  

2) Data harmonization is hard work but  worth doing -- I've learned not to trivialize the effort in stitching together data sets that have been published as linked data (4 star linked data); however

3) The task of linking data sets together (from which we derive the semantic goodness linked data advocates proclaim), comes from using several core 'interlinking' vocabularies, for example SKOS, RDFS, Dublin Core, (not an exhaustive list!!) to yield 5 star linked data -- this is the high octane fuel for global innovation & discovery. 

4) Whether you're publishing beautiful RDF for all your public data sets (meaning you've taken the big plunge), or getting started with your toe in the waters by publishing RDFa 1.1 Lite on your site, you've succeeded in making your data more accessible to search engines and lowered the barrier for participation for everyone -- what is not to love?!  This is what you wanted in the first place, right?

Peter -- what is very cool about your question IMO is that in one business day, open data advocates, entrepreneurs and public sector employees from all over the EU & US put forward projects we're involved in. 

While you didn't get a 'multiple choice' style answer, namely use Eurovoc | DCAT | ADMS, etc., hopefully you have confidence that we're beyond the tipping point publishing open government data and in many cases as 4 and 5 star linked data.  Not all data need be (or will be!!) published as linked data, but for data sets destined for access & re-use, while keeping it simple, keep in mind some basic principles (1-4 above). 

We'll do our bit to fold in the salient points from this thread to the forthcoming Best Practices for Linked Data document and DCAT Vocab emerging from the Gov't Linked Data WG (antic. May 2013).

Cheers,

Bernadette Hyland, co-chair 
W3C Government Linked Data Working Group
Charter: http://www.w3.org/2011/gld/

On Mar 4, 2013, at 12:12 PM, Martin Kaltenböck <m.kaltenboeck@semantic-web.at> wrote:

> Hi Phil, all
> 
> also would like to jump in here - I do NOT believe either that we will find 
> a taxonomy et al that the whole world will use to categorise / to classify 
> open data sets globally - e.g. in Austria we have specified 14 categories that were already used
> in eGovernment here in Austria (before we started open data) - see: http://reference.e-government.gv.at/uploads/media/OGD-Metadaten_2_1_2012_10.pdf
> (Page 27 - document is in German language but these categories are also translated to EN language)
> - we also tried a first mapping to publicdata..eu as well as EN ISO 19115
> 
> Also Eurovoc (or the mentioned NACE codes) are interesting approaches - 
> but cover only a part of the whole picture...
> 
> Also (E)government in e.g. Austria works differently then in e.g. Czech Republic 
> and our classification schemes are different - and we do have different languages,...
> 
> AND: in a time of Linked (Open) Data it is good to re-use an existing classification scheme 
> (e.g. Eurovoc as a starting point in EU is great for sure, as it is already in use and also already translated)
> BUT: if someone does NOT use such a common 'list of categories' or has the need to expand his categorisation system 
> then we can link our approaches (making use of LOD principles) and thereby align our 'terminologies = classification schemes' 
> and thereby we are able to understand each other again (means: the machines understand our several classification schemes ;)...
> 
> Such an alignment is (hard) work for sure - but worth doing - so that for example US open data sets
> can be easily found and compared with UK data sets, Austrian data sets etc.....
> 
> We here at SWC have started to build a small open data SKOS Thesaurus some time ago that is / can be linked to
> other thesauri / taxonomies easily - see: http://vocabulary.semantic-web.at/OpenData.html
> (Also published using ADMS: http://vocabulary.semantic-web.at/OpenData/adms/0.3)
> 
> This is only a demo (only a few concepts in it and some links to DBpedia established) - 
> but it shows the principle and could (when built properly) be a good basis for categorisation / tagging of open data sets...
> 
> Maybe this helps the discussion - cheers - martin
> 
> --
> Martin Kaltenböck, CMC
> Managing Partner, CFO
> 
> Semantic Web Company (SWC)
> Mariahilfer Strasse 70 / 8
> A - 1070 Vienna, Austria
> Tel +43 1 402 12 35 - 25
> Fax +43 1 402 12 35 - 22
> Mobile +43 650 3905697
> 
> http://www.semantic-web.at
> http://blog.semantic-web.at
> http://poolparty.biz
> 
> LOD2 - Creating Knowledge out of Interlinked Data - http://lod2.eu/
> EDF2013, 09-10 April2013, Dublin - http://2013.data-forum.eu/
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: "Phil Archer" <phila@w3.org>
> To: "Bernadette Hyland" <bhyland@3roundstones.com>
> Cc: "Fadi Maali" <fadi.maali@deri.org>, "John Erickson" <olyerickson@gmail.com>, "W3C public GLD WG WG" <public-gld-wg@w3.org>
> Sent: Monday, 4 March, 2013 5:36:25 PM
> Subject: Re: Fwd: Classification of open datasets...
> 
> I pitched in with a small comment. I've seen the thread and thought 
> "must thread that" - and now I have. DCAT etc. is not really what 
> Peter's after - it's values for dcterms:subject I think. The concepts 
> around vocab profiles is something I'm looking at in the context of 
> possible work items for a successor WG. So what I'm looking for now is 
> possible community interest, background info etc.
> 
> P
> 
> On 04/03/2013 15:34, Bernadette Hyland wrote:
>> Hi Fadi, John & Phil,
>> There is a detailed thread that Peter Krantz kicked off about open data set vocabularies on Friday (1-Mar).  It was sent to the euopendata@lists.okfn.org and public-egov-ig@w3.org list and not the public gld working group list (unfortunately).
>> 
>> I encourage you to look at the thread in its entirety as people from EU & US are weighing on with a variety of answers and this is near & dear to the charter of the GLD WG.  Unfortunately, people have omitted the entire thread when responding but I'll forward some responses FYR.
>> 
>> Great example of where relevant guidance is required in context of what people are using today open data initiatives and describing gov't data sets.
>> 
>> Phi, Deirdre, Martin  -- This would be great material for a talk at the European Data Forum and/or Open Data on the Web workshop, both in April.  Just saying ...
>> 
>> 
>> Cheers,
>> 
>> Bernadette Hyland, co-chair
>> W3C Government Linked Data Working Group
>> Charter: http://www.w3.org/2011/gld/
>> 
>> Begin forwarded message:
>> 
>>> Resent-From: public-egov-ig@w3.org
>>> From: "koumenides c.l. (clk1v07)" <clk1v07@ecs.soton.ac.uk>
>>> Subject: RE: Classification of open datasets...
>>> Date: March 1, 2013 5:25:18 AM EST
>>> To: Peter Krantz <peter@peterkrantz.se>, "euopendata@lists.okfn.org" <euopendata@lists.okfn.org>, public-egov-ig <public-egov-ig@w3.org>
>>> 
>>> Hi
>>> 
>>> I suppose W3C's DCAT would be a candidate in this case. http://www.w3.org/TR/vocab-dcat/
>>> 
>>> Regards,
>>> 
>>> Christos
>>> ________________________________________
>>> From: Peter Krantz [peter@peterkrantz.se]
>>> Sent: 01 March 2013 09:32
>>> To: euopendata@lists.okfn.org; public-egov-ig
>>> Subject: Classification of open datasets...
>>> 
>>> Hi!
>>> 
>>> Many countries are developing national portals with metadata about
>>> open datasets from the public sector. To make datasets easier to find
>>> and to lower the threshold for pan-european (or global) re-use it
>>> would be great if classification of datasets followed a shared
>>> taxonomy.
>>> 
>>> There are many candidates that could be used, e.g. Eurovoc [1], NACE
>>> [2]. I would be grateful for any pointers if there is work going on to
>>> harmonize classification of datasets on a global or European level.
>>> 
>>> Regards,
>>> 
>>> Peter Krantz
>>> http://www.peterkrantz.com
>>> @peterkz_swe
>>> 
>>> [1]: http://eurovoc.europa.eu/ - availabble as LOD
>>> [2]: http://ec.europa.eu/competition/mergers/cases/index/nace_all.html
>>> 
>> 
>> 
> 
> -- 
> 
> Phil Archer
> W3C eGovernment
> 
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
> 
> 

Received on Monday, 4 March 2013 18:24:12 UTC