Re: Versioning system for ontologies from Markus Pilzecker on 2013-05-03 (semantic-web@w3.org from May 2013)

From: Markus Pilzecker <mp.lists@free.fr>
Date: Fri, 3 May 2013 22:59:27 +0000
To: Prateek <prateek@knoesis.org>
Cc: Alan Ruttenberg <alanruttenberg@gmail.com>, Ali SH <asaegyn+out@gmail.com>, "Stephen D. Williams" <sdw@lig.net>, "semantic-web\@w3.org" <semantic-web@w3.org>
Message-ID: <20868.16719.924674.99147@eow.low-entropy.linux.dk>

Hello Pratek and community,

Prateek writes:
 > Thanks everyone.
 > 
 > For my requirement, its more like, 3 people are collaborating on the same
 > ontology. They use different tools, platforms to make the changes.Question
 > might arise why they are not using the same tool? Because, they are in
 > different organizations and happen to be using the same ontology.
 > 
 > There might be syntactic differences or lines where they made the changes,
 > but semantically their changes are similar or built on top of each other. I
 > am not sure if Git or SVN can help with that.  Also the facility to query
 > based on a specific version at a point of time would be wonderful!
 > 
In my opinion, there are two major use-cases for version management: 
  - internal, ie. inter-release versioning
  - external, ie. release versioning
.

Inter-release versioning is to grasp the fine-grained changes, a
development team produces.  This is best done with one of the
mentioned general-purpose version control systems with all the
features, we know well from software-engineering {distribution,
branching-merging, collaboration, ...}.  

Practically, for inter-release versioning, it's best to use a line
break friendly format like N3 and normalise it, before committing, in 
order to avoid permutation noise.  Normalisation shall also suppress
syntactic tool sugar in the serialisations.  At least for my taste,
diffs on N3 are quite readable.

Release versioning is best done by a means, where the different
versions of semantic entities are simultaneously available.  
The important point is, that an entity, after having been deployed
|published, must never be changed.  If you change it, it's not the
same entity anymore -- and you risk to break any [usually unknown]
dependendent system.  {A node is not the same anymore, if any piece of
the graph, reachable in navigability direction, changes.  An edge is
not the same anymore, if you change its type or usage-policy.}   
You may have a similar entity, which the developers intend to be the
successor of a preceeding one, but this has to get a new ID.  

One may implement this coarse-grained versioning of released content 
as a sequence {or tree in the branch case} of whole ontologies, like a
film is a sequence of pictures.  Or, if one has a very high degree of
unchanged content from one release to the next, one may amalgamate all
versions into one large ontology, which contains the whole world, like
Minkowski space contains the whole history ever.  In the first case,
it's simplest to compose the intra-ontology IDs of an ontology part
and an intra-ontology part.  In the amalgameted case, you simply have
the usual uniqueness constraint, since there's nothing else [than the
whole "world" {ie. ontology scope with eternal history}].  You are
free to use some kind of version substring in your IDs [,but this is
of course  meaningless -- as IDs are always atoms].  Ontological
repositories support both approaches ootb.

Also, querying is not a problem in any of the mentioned cases:
  - in the inter-release case, you have to apply your query to the
    desired checkout
  - in the release case, global and eternal ID uniqueness does all,
    you need
..  

'Hope, it was understandable and helps,

   Markus 

PS: personally, I'd avoid to work with the amalgamated eternal
ontology "layout", since the reusable part of every snapshot is
smaller, than you might think {navigability-subgraph} and it's hard to 
test.

Received on Saturday, 4 May 2013 14:00:00 UTC