Re: Best practices for versioning and documenting ontologies for Sem Web from Lavdim Halilaj on 2017-01-30 (semantic-web@w3.org from January 2017)

From: Lavdim Halilaj <halilaj@cs.uni-bonn.de>
Date: Mon, 30 Jan 2017 19:00:20 +0100
Cc: semantic-web@w3.org, "Munson J.E." <J.Munson@soton.ac.uk>
Message-ID: <42466a2b-7773-ac48-b9b4-cf23b6487f1b@cs.uni-bonn.de>
One option more among other tools/frameworks for ontology development 
using version control systems such as Git is VoCol:

- Works on top of Git and with repository hosting services like GitHub, 
Gitlab and BitBucket;
- It should be installed only once to some hosting server and the whole 
team will benefit from its services;
- It is independent of the actual Git repository, i.e., it can be 
activated and configured without affecting the repository.

The following services are currently integrated into VoCol (and are 
automatically executed with every push event):
- Syntax validation with Rapper and Jena Riot (also executed with every 
pre-commit event which denies any commit that contain syntactic errors);
- An integrated Turtle Editor with basic functionalities for "real-time" 
syntax checking and auto-completion;
- Documentation generation with Schema.org and Widoco;
- Visualization with WebVOWL;
- SPARQL endpoint querying with Jena Fuseki;
- Some basic analytics/statistics functionalities;
- Evolution report with OWL2VCS.

All services can be (de-)activated in the VoCol settings, and other 
tools and services can be added as extensions.

Additional features of VoCol include dereferenceable URIs, content 
negotiation, support of branches, etc.

An online demo is available at: http://butterbur06.iai.uni-bonn.de
An example of the configuration page can be accessed at: 
http://butterbur06.iai.uni-bonn.de/docs/configuration_page.html

We provide a Vagrant Box, which you can download and run. With the 
Vagrant Share mechanism, every generated artifact is publicly accessible 
(while hosted on your machine). VoCol is provided also as a docker image.

More information about VoCol and screen casts for installing and using 
it can be found on GitHub at: https://github.com/vocol/vocol

We are almost ready to integrate a service that allows ontology 
engineers to use any editor such as: Protege, TopBraid, etc, and in the 
end will generate a unique serialization with the objective to prevent 
from false-positive conflicts that can be indicated by VCS i.e. Git, due 
to different serializations.

Best,
Lavdim


On 30.01.2017 18:10, Chris Mungall wrote:
>
> Yes, as Simon points out the situation for managing an ontology in 
> github is /much/ better now than it was a few years ago.
>
> Diffs can still be hard to interpret (which is why many groups still 
> use obo format). For command line git diffs we've found this to be 
> very useful:
> https://github.com/ShahimEssaid/git-owl-tools/wiki
>
> Some notes on setting up an ontology project in github/gitlab etc (has 
> assumptions that this will be a bio-ontology federated with OBO but 
> many aspects still likely useful):
> https://douroucouli.wordpress.com/2015/12/16/creating-an-ontology-project-an-update/
> https://github.com/cmungall/ontology-starter-kit/
>
> On 30 Jan 2017, at 6:08, Simon Spero wrote:
>
>     Protégé uses the OWLAPI, and for precisely the problems you
>     mention, the major document output formats now generate generally
>     stable output ordering.
>
>     This makes it much easier to use common version control systems
>     like git to handle revision control.
>
>     There is discussion of this in issue #273  on the github
>     repository. https://github.com/owlcs/owlapi/issues/273
>     <https://github.com/owlcs/owlapi/issues/273>
>
>     There is also a brief summary in one section my OWLED paper from
>     2015. Using the Gene Ontology as data source,  and taking a
>     typical day as example:
>
>     For the versions committed on 14 Jun 2014, there were  28
>     insertions and 5 deletions made to the OBO file.  This resulted in
>     164,305 insertions, and 164,256 deletions in the unordered RDF/XML
>     output.  With ordered RDF/XML , there were 54 insertions, and 5
>     deletions.
>
>     I  should note here that some social VCS platforms are overly
>     cautious about displaying diffs of very large files, even when the
>     diffs are tiny.  At the time of study, github refused to try ;
>     bitbucket (git mode) would issue an are-you-sure warning, then
>     surprise itself :) I'm not sure about gitlab.
>
>     I should also note that frame-like formats like Manchester Syntax
>     are much easier to merge if you have several people working on the
>     same source.
>
>     Modularity is your friend :)
>
>     Ontology documentation should be about more than just the
>     individual vocabulary terms.  You should be aware of what
>     metaphysical choices you are making, and keep track of them as
>     they occur (metaphysics should be avoided as much as possible, but
>     it's important to be able to recognize it so you know what to run
>     away from). This should be part of the meta documentation for the
>     ontology team, and for future maintainers. You should have
>     guidelines for descriptions,  scope notes and labels on your
>     vocabulary terms.  The Cyc guidelines for comments may be a
>     helpful starting point.
>
>     Documentation should not be a substitute for axioms, but it may be
>     helpful for the reader to restate what the axioms say. Just like
>     with code and comments, it is critical that any such restatements
>     stay in sync.
>
>     Documentation should be designed to meet the needs of the people
>     who will be using it. Documentation aimed at the end user should
>     be written for that audience, and just like any other system
>     component,  should be properly tested.  If some documentation is
>     written for users who will be applying a vocabulary, then you
>     should run tests to see if the vocabulary is being correctly applied.
>     If documentation is written for those who will be consuming uses
>     of the vocabulary, you should test to see if the uses of the
>     vocabulary are being properly understood.
>     If there are problems, you may need to fix the ontology, not the
>     documentation.
>     Don't try to change the way the SME thinks about their area of
>     expertise.
>
>     Simon
>
>     On Jan 30, 2017 3:40 AM, "Martin Hepp" <mfhepp@gmail.com
>     <mailto:mfhepp@gmail.com>> wrote:
>
>         I would recommend using
>
>         1. a syntax for the ontology like N3/Turtle where changes in
>         the conceptual model are more or less directly equivalent to
>         changes in the serialization. A bad example would be RDF/XML
>         auto-generated from a tool like Protégé. At least in earlier
>         times, the serialization in RDF/XML could vary greatly despite
>         only minor changes in the conceptual model, in particular if
>         you  used different versions of the tool to generate the code.
>
>         The underlying reason is that RDF has no defined ordering of
>         statements, so there are many different ways to represent the
>         same RDF graph.
>
>         2. a standard version-control system like Git or Mercurial for
>         hosting the code.
>
>         This allows a very good documentation of the entire evolution
>         of your model, and this is how we do it at schema.org
>         <http://schema.org>.
>
>         There are a few problems with this approach, though:
>
>         1. You will have to encode the ontology using a source-code
>         editior - no neat GUI etc. While this is straightforward for
>         basic RDFS/OWL ontologies, it is a bit complicated for
>         advanced OWL language elements.
>
>         2. If you reorganize the code or make minor syntactical
>         changes (like replacing spaces by tabs or vice versa), you
>         will still see changes in a diff that do not reflect changes
>         in the conceptual model, so you need to be very disciplined
>         when coding.
>
>         But other than that, I think this is the best way to solve this.
>
>         For publishing versions of the ontology, you could use the
>         same mechanism as the W3C for versioning technical documents, i.e.
>
>         - one URI for the current version, like
>
>         http://foo.org/onto or http://foo.org/onto#
>
>         and
>         - one URI for each released version, including the date of the
>         release, like
>
>         http://foo.org/onto/20170130 or http://foo.org/onto/20170130#
>
>         There are of course many proposals to handle ontology
>         versioning with additional meta-data and tooling; for an
>         overview, see
>
>         https://scholar.google.com/scholar?hl=en&q=ontology+versioning&btnG=&as_sdt=1%2C5&as_sdtp=
>         <https://scholar.google.com/scholar?hl=en&q=ontology+versioning&btnG=&as_sdt=1%2C5&as_sdtp=>
>
>         >From my top-level understanding, however, the current state
>         of the art is limited to maintaining meta-data about the state
>         and evolution of the ontology, while automatic translation
>         between different versions of the same ontology is still very
>         hard. For the pure documentation of the changes, a
>         version-control system does mainly the same job.
>
>         For an introduction to the problems towards ontology
>         versioning and evolution, read e.g.
>
>         http://link.springer.com/article/10.1007%2Fs10115-003-0137-2?LI=true
>         <http://link.springer.com/article/10.1007%2Fs10115-003-0137-2?LI=true>
>
>         Also keep in mind that ontologies are by their very nature
>         approximate specifications of a domain model, so there can be
>         changes in the intended meaning of ontology elements that are
>         not reflected in the axiomatic specification of the ontology.
>
>         Best wishes
>
>         Martin
>
>
>
>         -----------------------------------
>         martin hepp http://www.heppnetz.de
>         mhepp@computer.org <mailto:mhepp@computer.org>         @mfhepp
>
>
>
>
>         > On 30 Jan 2017, at 00:19, Munson J.E. <J.Munson@soton.ac.uk
>         <mailto:J.Munson@soton.ac.uk>> wrote:
>         >
>         > Dear team
>         >
>         > My name is Jo Munson and I am currently a PhD candidate at
>         the University of Southampton.
>         > We are currently working with an external organisation
>         looking to put a 'real life' ontology together and am writing
>         to ask whether there are any tools / best practices for
>         > versioning and documenting from your perspective (for
>         commercial/public use, not just in a research context).
>         >
>         > Many thanks for your time
>         >
>         > Jo
>         >
>         > Web Science PhD Candidate
>         > University of Southampton
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>
>
>
Received on Monday, 30 January 2017 18:51:47 UTC