W3C home > Mailing lists > Public > public-esw-thes@w3.org > September 2013

AW: Comparing versions of SKOS terminologies

From: Neubert Joachim <J.Neubert@zbw.eu>
Date: Tue, 10 Sep 2013 15:11:08 +0000
To: "'rob.tice@k-int.com'" <rob.tice@k-int.com>, "public-esw-thes@w3.org" <public-esw-thes@w3.org>
Message-ID: <4E3518D35BC4E14B802A8CAE0E8819CF1AB8C2@lhun.zbw-nett.zbw-kiel.de>
Hi Rob,

Thank you for the screenshots and the additional considerations. These indeed provides food for thought ...

What comes to my mind are two basic distinctions:

Perspective: From "inside" a vocabulary management system (or interlinked vocabulary management systems of the same type) vs. from "outside", as a mere consumer of downloaded SKOS files

Clearly, the inside perspective allows much more control about who and when changed what, and perhaps can even enforce or encourage the use of history notes about the why. On the other hand, existing and probably future systems will be extremely diverse in how they track changes (esp. re. the granularity of what they consider a change), and often the information is not available for outside data consumers.

Rhythm: Changes between distinct published versions vs. a continuous stream of relatively low level changes (as provided e.g. also from LoC's atom feeds).

The latter, especially for large vocabularies, will be quite challenging. Perhaps, the ResourceSync (http://www.niso.org/workrooms/resourcesync/) guys will come up with something, but as far as I understood, they are not yet there.

For now, I feel like concentrating on the lowest hanging fruits - published files with discrete versions - could still bear enough work.

Cheers, Joachim

-----Ursprüngliche Nachricht-----
Von: Rob Tice [mailto:rob.tice@k-int.com] 
Gesendet: Donnerstag, 5. September 2013 08:08
An: Neubert Joachim; public-esw-thes@w3.org
Betreff: RE: Comparing versions of SKOS terminologies

Hi Neubert

For the server, the numbers and types of changes at a vocab/concept scheme level are presented as part of an RSS feed (updates.JPG). Direct access to these changes is available via the REST API.

For changes to vocabs/concept schemes in the desktop editor (which links to the server for collaborative development) , users are presented with a direct description of every change and also the ability to rollback
(restore) to any point in the lifetime of the vocab/concept scheme
(desktop.JPG) . This  uses the delta feature behind the scenes (in reverse).

Changes to a vocab/concept scheme can also be accessed on the server via the rest api  (api.jpeg). The revision numbers in this vocab/concept scheme revision list can then be used to drive the history/delta delivery in machine to machine transactions. This allows consuming applications to quickly find out if anything has changed since they last checked  and then to request only  the updates they need  - which is important if the concept scheme in question contains hundreds of thousands of terms (or even millions!).

Pseudo workflow as follows:

As a consuming application, I am interested in whether vocabs x, y and z have changes since I last checked.
----------------------------------------------------------------------------
---------------------------------------------------
What is the latest revision of the repository as a whole (compared to my stored value)?

changed yes/no
if yes
get latest revision number for vocab x (compare current latest revision number to my stored value) - if different request delta between current revision and my stored revision get latest revision number for vocab y (compare current latest revision number to my stored value) ... ditto get latest revision number for vocab z (compare current latest revision number to my stored value) ..  ditto

if no
none of the vocabs can have changed


I should point out that we don't use SKOS as our format for delivering this management information although it is one of the formats that we support for ingest, editing and export.

Hope this provides food for thought  :)

Cheers

Rob






-----Original Message-----
From: Neubert Joachim [mailto:J.Neubert@zbw.eu]
Sent: 04 September 2013 10:45
To: public-esw-thes@w3.org
Subject: AW: Comparing versions of SKOS terminologies

Hi Rob,

Thanks for sharing your approach. I like the differentiation between delta and history. Are there screenshots available on how these are presented to Lexaurus users?

Cheers, Joachim

-----Ursprüngliche Nachricht-----
Von: Rob Tice [mailto:rob.tice@k-int.com]
Gesendet: Freitag, 30. August 2013 09:12
An: Neubert Joachim; public-esw-thes@w3.org
Betreff: RE: Comparing versions of SKOS terminologies

Hi Joachim.

Our Lexaurus terminology management solution allows you to obtain the differences between any 2 versions of a SKOS vocabulary  (e.g. 4 versions - there are available differences between 1 and 2, 1 and 3, 1 and 4, 2 and 3
etc) and also the individual lifecycle of any concept in it.

We define 2 types of 'difference output' which are 

Delta	 - differences 
History 	 - lifecycle changes

The main difference being that if a concept is added then deleted between 2 versions (e.g. 1 and 3) , this will not appear in a 'delta' but will appear as an 'add' followed by a 'delete' in the 'history' output for these 2 versions.

Cheers

Rob


-----Original Message-----
From: Neubert Joachim [mailto:J.Neubert@zbw.eu]
Sent: 27 August 2013 18:34
To: 'public-esw-thes@w3.org'
Subject: Comparing versions of SKOS terminologies

When a new version of, say, a thesaurus is published, user are interested in "What's new" and "What has changed?". I'm currently racking my brain about this. Has anyone solved the pretended-simple problem of  comparing two versions of a SKOS file, and the obviously not-so-simple one of formatting the output in a way that is intelligible?

When it comes down to diff RDF files, there are some solutions listed in http://www.w3.org/2001/sw/wiki/How_to_diff_RDF. The most simple way I found was using rdf.sh (https://github.com/seebi/rdf.sh), which simply system-diffs sorted .nt files produced by rapper. (You need to filter out blank nodes here, but this shouldn't be much of a problem with SKOS files.) Using git diff as a diff tool, this gives me a stat of something like "7443 insertions(+), 6937 deletions(-)" (on the two most recent versions of STW Thesaurus for Economics).

Obviously, this triple-level diff doesn't help much for the users. A possible way of action could be:

1) Group changes for each concept.
2) Recognize insertion and deletion of concepts as a whole (presumably the most important changes).
3) Recognize certain types of changes (e.g., altered prefLabel, added altLabel, changed relations).
4) Enrich the concept URIs with the preferred label (in a given language).
5) Arrange everything nicely on a RDFa overview page (additions/deletion of concepts, perhaps some of the more important types of changes, statistics such as amount of changed/unchanged concepts, etc.)
6) Provide change record (RDFa) pages per concept, which can be linked from a concept page.
7) Optionally, if the terminology includes meta-structures such as a term classification, add aggregated information about the most intensively changed subject areas to the overview page.

Thoughts? Has somebody done something similar already?

Cheers, Joachim

--
Joachim Neubert

ZBW - German National Library of Economics Leibniz Information Centre for Economics Neuer Jungfernstieg 21
20354 Hamburg
Received on Tuesday, 10 September 2013 15:11:39 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:46:26 UTC