W3C home > Mailing lists > Public > semantic-web@w3.org > January 2017

Re: Best practices for versioning and documenting ontologies for Sem Web

From: Chris Mungall <cjmungall@lbl.gov>
Date: Mon, 30 Jan 2017 09:10:09 -0800
To: "Simon Spero" <sesuncedu@gmail.com>
Cc: "Martin Hepp" <mfhepp@gmail.com>, "Thuermer G." <gefion.thuermer@soton.ac.uk>, semantic-web@w3.org, "Munson J.E." <J.Munson@soton.ac.uk>
Message-ID: <CA464409-63A3-4136-932B-D786DEDEF88E@lbl.gov>
Yes, as Simon points out the situation for managing an ontology in 
github is *much* better now than it was a few years ago.

Diffs can still be hard to interpret (which is why many groups still use 
obo format). For command line git diffs we've found this to be very 

Some notes on setting up an ontology project in github/gitlab etc (has 
assumptions that this will be a bio-ontology federated with OBO but many 
aspects still likely useful):

On 30 Jan 2017, at 6:08, Simon Spero wrote:

> Protégé uses the OWLAPI, and for precisely the problems you mention, 
> the
> major document output formats now generate generally stable output 
> ordering.
> This makes it much easier to use common version control systems like 
> git to
> handle revision control.
> There is discussion of this in issue #273  on the github repository.
> https://github.com/owlcs/owlapi/issues/273
> There is also a brief summary in one section my OWLED paper from 2015.
> Using the Gene Ontology as data source,  and taking a typical day as
> example:
> For the versions committed on 14 Jun 2014, there were  28 insertions 
> and 5
> deletions made to the OBO file.  This resulted in 164,305 insertions, 
> and
> 164,256 deletions in the unordered RDF/XML output.  With ordered 
> there were 54 insertions, and 5 deletions.
> I  should note here that some social VCS platforms are overly cautious
> about displaying diffs of very large files, even when the diffs are 
> tiny.
> At the time of study, github refused to try ; bitbucket (git mode) 
> would
> issue an are-you-sure warning, then surprise itself :) I'm not sure 
> about
> gitlab.
> I should also note that frame-like formats like Manchester Syntax are 
> much
> easier to merge if you have several people working on the same source.
> Modularity is your friend :)
> Ontology documentation should be about more than just the individual
> vocabulary terms.  You should be aware of what metaphysical choices 
> you are
> making, and keep track of them as they occur (metaphysics should be 
> avoided
> as much as possible, but it's important to be able to recognize it so 
> you
> know what to run away from). This should be part of the meta 
> documentation
> for the ontology team, and for future maintainers. You should have
> guidelines for descriptions,  scope notes and labels on your 
> vocabulary
> terms.  The Cyc guidelines for comments may be a helpful starting 
> point.
> Documentation should not be a substitute for axioms, but it may be 
> helpful
> for the reader to restate what the axioms say. Just like with code and
> comments, it is critical that any such restatements stay in sync.
> Documentation should be designed to meet the needs of the people who 
> will
> be using it. Documentation aimed at the end user should be written for 
> that
> audience, and just like any other system component,  should be 
> properly
> tested.  If some documentation is written for users who will be 
> applying a
> vocabulary, then you should run tests to see if the vocabulary is 
> being
> correctly applied.
> If documentation is written for those who will be consuming uses of 
> the
> vocabulary, you should test to see if the uses of the vocabulary are 
> being
> properly understood.
> If there are problems, you may need to fix the ontology, not the
> documentation.
> Don't try to change the way the SME thinks about their area of 
> expertise.
> Simon
> On Jan 30, 2017 3:40 AM, "Martin Hepp" <mfhepp@gmail.com> wrote:
> I would recommend using
> 1. a syntax for the ontology like N3/Turtle where changes in the 
> conceptual
> model are more or less directly equivalent to changes in the 
> serialization.
> A bad example would be RDF/XML auto-generated from a tool like 
> Protégé. At
> least in earlier times, the serialization in RDF/XML could vary 
> greatly
> despite only minor changes in the conceptual model, in particular if 
> you
> used different versions of the tool to generate the code.
> The underlying reason is that RDF has no defined ordering of 
> statements, so
> there are many different ways to represent the same RDF graph.
> 2. a standard version-control system like Git or Mercurial for hosting 
> the
> code.
> This allows a very good documentation of the entire evolution of your
> model, and this is how we do it at schema.org.
> There are a few problems with this approach, though:
> 1. You will have to encode the ontology using a source-code editior - 
> no
> neat GUI etc. While this is straightforward for basic RDFS/OWL 
> ontologies,
> it is a bit complicated for advanced OWL language elements.
> 2. If you reorganize the code or make minor syntactical changes (like
> replacing spaces by tabs or vice versa), you will still see changes in 
> a
> diff that do not reflect changes in the conceptual model, so you need 
> to be
> very disciplined when coding.
> But other than that, I think this is the best way to solve this.
> For publishing versions of the ontology, you could use the same 
> mechanism
> as the W3C for versioning technical documents, i.e.
> - one URI for the current version, like
> http://foo.org/onto or http://foo.org/onto#
> and
> - one URI for each released version, including the date of the 
> release, like
> http://foo.org/onto/20170130 or http://foo.org/onto/20170130#
> There are of course many proposals to handle ontology versioning with
> additional meta-data and tooling; for an overview, see
>     https://scholar.google.com/scholar?hl=en&q=ontology+versioni
> ng&btnG=&as_sdt=1%2C5&as_sdtp=
>> From my top-level understanding, however, the current state of the 
>> art is
> limited to maintaining meta-data about the state and evolution of the
> ontology, while automatic translation between different versions of 
> the
> same ontology is still very hard. For the pure documentation of the
> changes, a version-control system does mainly the same job.
> For an introduction to the problems towards ontology versioning and
> evolution, read e.g.
>     http://link.springer.com/article/10.1007%2Fs10115-003-0137-2?LI=true
> Also keep in mind that ontologies are by their very nature approximate
> specifications of a domain model, so there can be changes in the 
> intended
> meaning of ontology elements that are not reflected in the axiomatic
> specification of the ontology.
> Best wishes
> Martin
> -----------------------------------
> martin hepp  http://www.heppnetz.de
> mhepp@computer.org          @mfhepp
>> On 30 Jan 2017, at 00:19, Munson J.E. <J.Munson@soton.ac.uk> wrote:
>> Dear team
>> My name is Jo Munson and I am currently a PhD candidate at the 
>> University
> of Southampton.
>> We are currently working with an external organisation looking to put 
>> a
> 'real life' ontology together and am writing to ask whether there are 
> any
> tools / best practices for
>> versioning and documenting from your perspective (for 
>> commercial/public
> use, not just in a research context).
>> Many thanks for your time
>> Jo
>> Web Science PhD Candidate
>> University of Southampton
Received on Monday, 30 January 2017 17:11:05 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:41:54 UTC