RE: Classification of open datasets...

I am working on this as well for an upcoming Big Data Symposium:
http://semanticommunity.info/Big_Data_Symposia

I also presented some work on this about two years ago in my keynote for the
SEMIC.EU Conference:
http://semanticommunity.info/Build_SEMIC.EU_in_the_Cloud 

Bottom Line: All the work with Data Catalogs does not really help with data
integration as I have been able to show!

Dr. Brand Niemann
Director and Senior Data Scientist
Semantic Community
http://semanticommunity.info 
http://gov.aol.com/bloggers/brand-niemann/ 
703-268-9314

-----Original Message-----
From: Bernadette Hyland [mailto:bhyland@3roundstones.com] 
Sent: Monday, March 04, 2013 1:24 PM
To: Peter Krantz; Martin Kaltenböck; Phil Archer
Cc: Fadi Maali; John Erickson; W3C public GLD WG WG; egov-ig mailing list;
euopendata@lists.okfn.org
Subject: Re: Classification of open datasets...

Hi Peter,
Thank you for kicking off a thread initially on the e-gov IG and EU Open
Data lists.  I've broadened to include the public W3C Government Linked Data
working group because we're interested stakeholders. I hope this helps ...

Today, I pinged several of the editors of the DCAT vocabulary that is on
track as a W3C Recommendation document.  Immediately, several responses from
working group members (both in Europe) independently shared a perspective
that is held by many linked data advocates:

1) There is no one ring to rule them all -- there is no one vocabulary to
describe gov't data sets globally.  That is a feature, not a bug.  

2) Data harmonization is hard work but  worth doing -- I've learned not to
trivialize the effort in stitching together data sets that have been
published as linked data (4 star linked data); however

3) The task of linking data sets together (from which we derive the semantic
goodness linked data advocates proclaim), comes from using several core
'interlinking' vocabularies, for example SKOS, RDFS, Dublin Core, (not an
exhaustive list!!) to yield 5 star linked data -- this is the high octane
fuel for global innovation & discovery. 

4) Whether you're publishing beautiful RDF for all your public data sets
(meaning you've taken the big plunge), or getting started with your toe in
the waters by publishing RDFa 1.1 Lite on your site, you've succeeded in
making your data more accessible to search engines and lowered the barrier
for participation for everyone -- what is not to love?!  This is what you
wanted in the first place, right?

Peter -- what is very cool about your question IMO is that in one business
day, open data advocates, entrepreneurs and public sector employees from all
over the EU & US put forward projects we're involved in. 

While you didn't get a 'multiple choice' style answer, namely use Eurovoc |
DCAT | ADMS, etc., hopefully you have confidence that we're beyond the
tipping point publishing open government data and in many cases as 4 and 5
star linked data.  Not all data need be (or will be!!) published as linked
data, but for data sets destined for access & re-use, while keeping it
simple, keep in mind some basic principles (1-4 above). 

We'll do our bit to fold in the salient points from this thread to the
forthcoming Best Practices for Linked Data document and DCAT Vocab emerging
from the Gov't Linked Data WG (antic. May 2013).

Cheers,

Bernadette Hyland, co-chair
W3C Government Linked Data Working Group
Charter: http://www.w3.org/2011/gld/

On Mar 4, 2013, at 12:12 PM, Martin Kaltenböck
<m.kaltenboeck@semantic-web.at> wrote:

> Hi Phil, all
> 
> also would like to jump in here - I do NOT believe either that we will 
> find a taxonomy et al that the whole world will use to categorise / to 
> classify open data sets globally - e.g. in Austria we have specified 
> 14 categories that were already used in eGovernment here in Austria 
> (before we started open data) - see: 
> http://reference.e-government.gv.at/uploads/media/OGD-Metadaten_2_1_20
> 12_10.pdf (Page 27 - document is in German language but these 
> categories are also translated to EN language)
> - we also tried a first mapping to publicdata..eu as well as EN ISO 
> 19115
> 
> Also Eurovoc (or the mentioned NACE codes) are interesting approaches 
> - but cover only a part of the whole picture...
> 
> Also (E)government in e.g. Austria works differently then in e.g. 
> Czech Republic and our classification schemes are different - and we do
have different languages,...
> 
> AND: in a time of Linked (Open) Data it is good to re-use an existing 
> classification scheme (e.g. Eurovoc as a starting point in EU is great 
> for sure, as it is already in use and also already translated)
> BUT: if someone does NOT use such a common 'list of categories' or has 
> the need to expand his categorisation system then we can link our
approaches (making use of LOD principles) and thereby align our
'terminologies = classification schemes'
> and thereby we are able to understand each other again (means: the
machines understand our several classification schemes ;)...
> 
> Such an alignment is (hard) work for sure - but worth doing - so that 
> for example US open data sets can be easily found and compared with UK
data sets, Austrian data sets etc.....
> 
> We here at SWC have started to build a small open data SKOS Thesaurus 
> some time ago that is / can be linked to other thesauri / taxonomies 
> easily - see: http://vocabulary.semantic-web.at/OpenData.html
> (Also published using ADMS: 
> http://vocabulary.semantic-web.at/OpenData/adms/0.3)
> 
> This is only a demo (only a few concepts in it and some links to 
> DBpedia established) - but it shows the principle and could (when built
properly) be a good basis for categorisation / tagging of open data sets...
> 
> Maybe this helps the discussion - cheers - martin
> 
> --
> Martin Kaltenböck, CMC
> Managing Partner, CFO
> 
> Semantic Web Company (SWC)
> Mariahilfer Strasse 70 / 8
> A - 1070 Vienna, Austria
> Tel +43 1 402 12 35 - 25
> Fax +43 1 402 12 35 - 22
> Mobile +43 650 3905697
> 
> http://www.semantic-web.at
> http://blog.semantic-web.at
> http://poolparty.biz
> 
> LOD2 - Creating Knowledge out of Interlinked Data - http://lod2.eu/ 
> EDF2013, 09-10 April2013, Dublin - http://2013.data-forum.eu/
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: "Phil Archer" <phila@w3.org>
> To: "Bernadette Hyland" <bhyland@3roundstones.com>
> Cc: "Fadi Maali" <fadi.maali@deri.org>, "John Erickson" 
> <olyerickson@gmail.com>, "W3C public GLD WG WG" <public-gld-wg@w3.org>
> Sent: Monday, 4 March, 2013 5:36:25 PM
> Subject: Re: Fwd: Classification of open datasets...
> 
> I pitched in with a small comment. I've seen the thread and thought 
> "must thread that" - and now I have. DCAT etc. is not really what 
> Peter's after - it's values for dcterms:subject I think. The concepts 
> around vocab profiles is something I'm looking at in the context of 
> possible work items for a successor WG. So what I'm looking for now is 
> possible community interest, background info etc.
> 
> P
> 
> On 04/03/2013 15:34, Bernadette Hyland wrote:
>> Hi Fadi, John & Phil,
>> There is a detailed thread that Peter Krantz kicked off about open data
set vocabularies on Friday (1-Mar).  It was sent to the
euopendata@lists.okfn.org and public-egov-ig@w3.org list and not the public
gld working group list (unfortunately).
>> 
>> I encourage you to look at the thread in its entirety as people from EU &
US are weighing on with a variety of answers and this is near & dear to the
charter of the GLD WG.  Unfortunately, people have omitted the entire thread
when responding but I'll forward some responses FYR.
>> 
>> Great example of where relevant guidance is required in context of what
people are using today open data initiatives and describing gov't data sets.
>> 
>> Phi, Deirdre, Martin  -- This would be great material for a talk at the
European Data Forum and/or Open Data on the Web workshop, both in April.
Just saying ...
>> 
>> 
>> Cheers,
>> 
>> Bernadette Hyland, co-chair
>> W3C Government Linked Data Working Group
>> Charter: http://www.w3.org/2011/gld/
>> 
>> Begin forwarded message:
>> 
>>> Resent-From: public-egov-ig@w3.org
>>> From: "koumenides c.l. (clk1v07)" <clk1v07@ecs.soton.ac.uk>
>>> Subject: RE: Classification of open datasets...
>>> Date: March 1, 2013 5:25:18 AM EST
>>> To: Peter Krantz <peter@peterkrantz.se>, "euopendata@lists.okfn.org" 
>>> <euopendata@lists.okfn.org>, public-egov-ig <public-egov-ig@w3.org>
>>> 
>>> Hi
>>> 
>>> I suppose W3C's DCAT would be a candidate in this case. 
>>> http://www.w3.org/TR/vocab-dcat/
>>> 
>>> Regards,
>>> 
>>> Christos
>>> ________________________________________
>>> From: Peter Krantz [peter@peterkrantz.se]
>>> Sent: 01 March 2013 09:32
>>> To: euopendata@lists.okfn.org; public-egov-ig
>>> Subject: Classification of open datasets...
>>> 
>>> Hi!
>>> 
>>> Many countries are developing national portals with metadata about 
>>> open datasets from the public sector. To make datasets easier to 
>>> find and to lower the threshold for pan-european (or global) re-use 
>>> it would be great if classification of datasets followed a shared 
>>> taxonomy.
>>> 
>>> There are many candidates that could be used, e.g. Eurovoc [1], NACE 
>>> [2]. I would be grateful for any pointers if there is work going on 
>>> to harmonize classification of datasets on a global or European level.
>>> 
>>> Regards,
>>> 
>>> Peter Krantz
>>> http://www.peterkrantz.com
>>> @peterkz_swe
>>> 
>>> [1]: http://eurovoc.europa.eu/ - availabble as LOD
>>> [2]: 
>>> http://ec.europa.eu/competition/mergers/cases/index/nace_all.html
>>> 
>> 
>> 
> 
> --
> 
> Phil Archer
> W3C eGovernment
> 
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
> 
> 

Received on Monday, 4 March 2013 20:35:56 UTC