Re: Best practices for versioning and documenting ontologies for Sem Web

Protégé uses the OWLAPI, and for precisely the problems you mention, the
major document output formats now generate generally stable output ordering.

This makes it much easier to use common version control systems like git to
handle revision control.

There is discussion of this in issue #273  on the github repository.
https://github.com/owlcs/owlapi/issues/273

There is also a brief summary in one section my OWLED paper from 2015.
Using the Gene Ontology as data source,  and taking a typical day as
example:

For the versions committed on 14 Jun 2014, there were  28 insertions and 5
deletions made to the OBO file.  This resulted in 164,305 insertions, and
164,256 deletions in the unordered RDF/XML output.  With ordered RDF/XML ,
there were 54 insertions, and 5 deletions.

I  should note here that some social VCS platforms are overly cautious
about displaying diffs of very large files, even when the diffs are tiny.
At the time of study, github refused to try ; bitbucket (git mode) would
issue an are-you-sure warning, then surprise itself :) I'm not sure about
gitlab.

I should also note that frame-like formats like Manchester Syntax are much
easier to merge if you have several people working on the same source.

Modularity is your friend :)

Ontology documentation should be about more than just the individual
vocabulary terms.  You should be aware of what metaphysical choices you are
making, and keep track of them as they occur (metaphysics should be avoided
as much as possible, but it's important to be able to recognize it so you
know what to run away from). This should be part of the meta documentation
for the ontology team, and for future maintainers. You should have
guidelines for descriptions,  scope notes and labels on your vocabulary
terms.  The Cyc guidelines for comments may be a helpful starting point.

Documentation should not be a substitute for axioms, but it may be helpful
for the reader to restate what the axioms say. Just like with code and
comments, it is critical that any such restatements stay in sync.

Documentation should be designed to meet the needs of the people who will
be using it. Documentation aimed at the end user should be written for that
audience, and just like any other system component,  should be properly
tested.  If some documentation is written for users who will be applying a
vocabulary, then you should run tests to see if the vocabulary is being
correctly applied.
If documentation is written for those who will be consuming uses of the
vocabulary, you should test to see if the uses of the vocabulary are being
properly understood.
If there are problems, you may need to fix the ontology, not the
documentation.
Don't try to change the way the SME thinks about their area of expertise.

Simon

On Jan 30, 2017 3:40 AM, "Martin Hepp" <mfhepp@gmail.com> wrote:

I would recommend using

1. a syntax for the ontology like N3/Turtle where changes in the conceptual
model are more or less directly equivalent to changes in the serialization.
A bad example would be RDF/XML auto-generated from a tool like Protégé. At
least in earlier times, the serialization in RDF/XML could vary greatly
despite only minor changes in the conceptual model, in particular if you
used different versions of the tool to generate the code.

The underlying reason is that RDF has no defined ordering of statements, so
there are many different ways to represent the same RDF graph.

2. a standard version-control system like Git or Mercurial for hosting the
code.

This allows a very good documentation of the entire evolution of your
model, and this is how we do it at schema.org.

There are a few problems with this approach, though:

1. You will have to encode the ontology using a source-code editior - no
neat GUI etc. While this is straightforward for basic RDFS/OWL ontologies,
it is a bit complicated for advanced OWL language elements.

2. If you reorganize the code or make minor syntactical changes (like
replacing spaces by tabs or vice versa), you will still see changes in a
diff that do not reflect changes in the conceptual model, so you need to be
very disciplined when coding.

But other than that, I think this is the best way to solve this.

For publishing versions of the ontology, you could use the same mechanism
as the W3C for versioning technical documents, i.e.

- one URI for the current version, like

http://foo.org/onto or http://foo.org/onto#

and
- one URI for each released version, including the date of the release, like

http://foo.org/onto/20170130 or http://foo.org/onto/20170130#

There are of course many proposals to handle ontology versioning with
additional meta-data and tooling; for an overview, see

    https://scholar.google.com/scholar?hl=en&q=ontology+versioni
ng&btnG=&as_sdt=1%2C5&as_sdtp=

>From my top-level understanding, however, the current state of the art is
limited to maintaining meta-data about the state and evolution of the
ontology, while automatic translation between different versions of the
same ontology is still very hard. For the pure documentation of the
changes, a version-control system does mainly the same job.

For an introduction to the problems towards ontology versioning and
evolution, read e.g.

    http://link.springer.com/article/10.1007%2Fs10115-003-0137-2?LI=true

Also keep in mind that ontologies are by their very nature approximate
specifications of a domain model, so there can be changes in the intended
meaning of ontology elements that are not reflected in the axiomatic
specification of the ontology.

Best wishes

Martin



-----------------------------------
martin hepp  http://www.heppnetz.de
mhepp@computer.org          @mfhepp




> On 30 Jan 2017, at 00:19, Munson J.E. <J.Munson@soton.ac.uk> wrote:
>
> Dear team
>
> My name is Jo Munson and I am currently a PhD candidate at the University
of Southampton.
> We are currently working with an external organisation looking to put a
'real life' ontology together and am writing to ask whether there are any
tools / best practices for
> versioning and documenting from your perspective (for commercial/public
use, not just in a research context).
>
> Many thanks for your time
>
> Jo
>
> Web Science PhD Candidate
> University of Southampton
>
>
>
>
>
>
>

Received on Monday, 30 January 2017 14:09:14 UTC