RE: Comparing versions of SKOS terminologies

Hi Neubert

For the server, the numbers and types of changes at a vocab/concept scheme
level are presented as part of an RSS feed (updates.JPG). Direct access to
these changes is available via the REST API.

For changes to vocabs/concept schemes in the desktop editor (which links to
the server for collaborative development) , users are presented with a
direct description of every change and also the ability to rollback
(restore) to any point in the lifetime of the vocab/concept scheme
(desktop.JPG) . This  uses the delta feature behind the scenes (in reverse).

Changes to a vocab/concept scheme can also be accessed on the server via the
rest api  (api.jpeg). The revision numbers in this vocab/concept scheme
revision list can then be used to drive the history/delta delivery in
machine to machine transactions. This allows consuming applications to
quickly find out if anything has changed since they last checked  and then
to request only  the updates they need  - which is important if the concept
scheme in question contains hundreds of thousands of terms (or even
millions!).

Pseudo workflow as follows:

As a consuming application, I am interested in whether vocabs x, y and z
have changes since I last checked.
----------------------------------------------------------------------------
---------------------------------------------------
What is the latest revision of the repository as a whole (compared to my
stored value)?

changed yes/no
if yes 
get latest revision number for vocab x (compare current latest revision
number to my stored value) - if different request delta between current
revision and my stored revision
get latest revision number for vocab y (compare current latest revision
number to my stored value) ... ditto
get latest revision number for vocab z (compare current latest revision
number to my stored value) ..  ditto

if no
none of the vocabs can have changed


I should point out that we don’t use SKOS as our format for delivering this
management information although it is one of the formats that we support for
ingest, editing and export.

Hope this provides food for thought  :)

Cheers

Rob






-----Original Message-----
From: Neubert Joachim [mailto:J.Neubert@zbw.eu] 
Sent: 04 September 2013 10:45
To: public-esw-thes@w3.org
Subject: AW: Comparing versions of SKOS terminologies

Hi Rob,

Thanks for sharing your approach. I like the differentiation between delta
and history. Are there screenshots available on how these are presented to
Lexaurus users?

Cheers, Joachim

-----Ursprüngliche Nachricht-----
Von: Rob Tice [mailto:rob.tice@k-int.com] 
Gesendet: Freitag, 30. August 2013 09:12
An: Neubert Joachim; public-esw-thes@w3.org
Betreff: RE: Comparing versions of SKOS terminologies

Hi Joachim.

Our Lexaurus terminology management solution allows you to obtain the
differences between any 2 versions of a SKOS vocabulary  (e.g. 4 versions -
there are available differences between 1 and 2, 1 and 3, 1 and 4, 2 and 3
etc) and also the individual lifecycle of any concept in it.

We define 2 types of 'difference output' which are 

Delta	 - differences 
History 	 - lifecycle changes

The main difference being that if a concept is added then deleted between 2
versions (e.g. 1 and 3) , this will not appear in a 'delta' but will appear
as an 'add' followed by a 'delete' in the 'history' output for these 2
versions.

Cheers

Rob


-----Original Message-----
From: Neubert Joachim [mailto:J.Neubert@zbw.eu]
Sent: 27 August 2013 18:34
To: 'public-esw-thes@w3.org'
Subject: Comparing versions of SKOS terminologies

When a new version of, say, a thesaurus is published, user are interested in
"What's new" and "What has changed?". I'm currently racking my brain about
this. Has anyone solved the pretended-simple problem of  comparing two
versions of a SKOS file, and the obviously not-so-simple one of formatting
the output in a way that is intelligible?

When it comes down to diff RDF files, there are some solutions listed in
http://www.w3.org/2001/sw/wiki/How_to_diff_RDF. The most simple way I found
was using rdf.sh (https://github.com/seebi/rdf.sh), which simply
system-diffs sorted .nt files produced by rapper. (You need to filter out
blank nodes here, but this shouldn't be much of a problem with SKOS files.)
Using git diff as a diff tool, this gives me a stat of something like "7443
insertions(+), 6937 deletions(-)" (on the two most recent versions of STW
Thesaurus for Economics).

Obviously, this triple-level diff doesn't help much for the users. A
possible way of action could be:

1) Group changes for each concept.
2) Recognize insertion and deletion of concepts as a whole (presumably the
most important changes).
3) Recognize certain types of changes (e.g., altered prefLabel, added
altLabel, changed relations).
4) Enrich the concept URIs with the preferred label (in a given language).
5) Arrange everything nicely on a RDFa overview page (additions/deletion of
concepts, perhaps some of the more important types of changes, statistics
such as amount of changed/unchanged concepts, etc.)
6) Provide change record (RDFa) pages per concept, which can be linked from
a concept page.
7) Optionally, if the terminology includes meta-structures such as a term
classification, add aggregated information about the most intensively
changed subject areas to the overview page.

Thoughts? Has somebody done something similar already?

Cheers, Joachim

--
Joachim Neubert

ZBW - German National Library of Economics Leibniz Information Centre for
Economics Neuer Jungfernstieg 21
20354 Hamburg

Received on Thursday, 5 September 2013 07:08:07 UTC