AW: Comparing versions of SKOS terminologies

Hi Rob,

Thanks for sharing your approach. I like the differentiation between delta and history. Are there screenshots available on how these are presented to Lexaurus users?

Cheers, Joachim

-----Ursprüngliche Nachricht-----
Von: Rob Tice [mailto:rob.tice@k-int.com] 
Gesendet: Freitag, 30. August 2013 09:12
An: Neubert Joachim; public-esw-thes@w3.org
Betreff: RE: Comparing versions of SKOS terminologies

Hi Joachim.

Our Lexaurus terminology management solution allows you to obtain the differences between any 2 versions of a SKOS vocabulary  (e.g. 4 versions - there are available differences between 1 and 2, 1 and 3, 1 and 4, 2 and 3
etc) and also the individual lifecycle of any concept in it.

We define 2 types of 'difference output' which are 

Delta	 - differences 
History 	 - lifecycle changes

The main difference being that if a concept is added then deleted between 2 versions (e.g. 1 and 3) , this will not appear in a 'delta' but will appear as an 'add' followed by a 'delete' in the 'history' output for these 2 versions.

Cheers

Rob


-----Original Message-----
From: Neubert Joachim [mailto:J.Neubert@zbw.eu]
Sent: 27 August 2013 18:34
To: 'public-esw-thes@w3.org'
Subject: Comparing versions of SKOS terminologies

When a new version of, say, a thesaurus is published, user are interested in "What's new" and "What has changed?". I'm currently racking my brain about this. Has anyone solved the pretended-simple problem of  comparing two versions of a SKOS file, and the obviously not-so-simple one of formatting the output in a way that is intelligible?

When it comes down to diff RDF files, there are some solutions listed in http://www.w3.org/2001/sw/wiki/How_to_diff_RDF. The most simple way I found was using rdf.sh (https://github.com/seebi/rdf.sh), which simply system-diffs sorted .nt files produced by rapper. (You need to filter out blank nodes here, but this shouldn't be much of a problem with SKOS files.) Using git diff as a diff tool, this gives me a stat of something like "7443 insertions(+), 6937 deletions(-)" (on the two most recent versions of STW Thesaurus for Economics).

Obviously, this triple-level diff doesn't help much for the users. A possible way of action could be:

1) Group changes for each concept.
2) Recognize insertion and deletion of concepts as a whole (presumably the most important changes).
3) Recognize certain types of changes (e.g., altered prefLabel, added altLabel, changed relations).
4) Enrich the concept URIs with the preferred label (in a given language).
5) Arrange everything nicely on a RDFa overview page (additions/deletion of concepts, perhaps some of the more important types of changes, statistics such as amount of changed/unchanged concepts, etc.)
6) Provide change record (RDFa) pages per concept, which can be linked from a concept page.
7) Optionally, if the terminology includes meta-structures such as a term classification, add aggregated information about the most intensively changed subject areas to the overview page.

Thoughts? Has somebody done something similar already?

Cheers, Joachim

--
Joachim Neubert

ZBW - German National Library of Economics Leibniz Information Centre for Economics Neuer Jungfernstieg 21
20354 Hamburg

Received on Wednesday, 4 September 2013 09:45:33 UTC