Sharing ontologies in the semantic web world from Chris Catton on 2004-01-15 (www-rdf-interest@w3.org from January 2004)

From: Chris Catton <chris.catton@zoology.oxford.ac.uk>
Date: Thu, 15 Jan 2004 12:38:27 -0000
To: <www-rdf-interest@w3.org>
Message-ID: <002801c3db64$79781030$97184381@chios>
Hi;
Several threads over the past weeks have addressed the issue of sharing
ontologies in one way or another - most recently the subject-less thread
started by Jeroen Budts

This is related to some stuff I've been thinking about for a few weeks -
I'd be interested in feedback. 

An ontology is supposed to be a 'shared conceptualisation'.
Traditionally though, ontologies have been built by a few domain experts
and programmers, or through a long-drawn-out process of collaboration on
mailing lists.  This has usually been adequate for 'in-house' ontologies
and 'proof-of-concept' ontologies like FOAF, but I think we are starting
to see issues that arise from building ontologies in this way for the
wider community. One obvious problem is that if the only way to change
an ontology is to email the owner, the change process does not scale
well :-)

I'm wondering whether it is possible and/or better to build an ontology
by collaboration rather than 'survival of the fittest'.  My reason for
suggesting it is that I think there is a difference between tools (where
natural selection works well) and ontologies - a difference that is
something to do with granularity. Two tools may do basically the same
thing, and I chose which to use based on its feature set, reliability
etc.  I may even use both if I have to, but cost and effort mean I will
probably try and avoid it. It's much easier with ontologies I can take a
subgraph from here, a subgraph from there, etc.  However I contend that
this is not necessarily a good thing in that it is a) more work and b)
does not allow us to discriminate between the useful and the useless.  A
huge ontology might only contain a small subgraph that is actively
referenced by the community - how 'fit' is it in the competitive
landscape?  It seems to me that this is important at this stage of
development of the semantic web, where a lot of the ontologies out there
are a rather nasty mix of information about people, projects,
institutions, publication-related stuff, units and a little bit of
domain-specific information.

So what would be required to co-operate on ontology building?  First,
any tools should be intuitive enough to accept input from domain
experts, which probably means a GUI. It's possible to imagine a user
interface where people could select the properties they wanted to
display, and then generate an svg file to display them.  The user could
then create a new class, select from a list of properties, click on the
parent.  I guess what I'm describing is a very simplified online version
of oiled or protege ...
 
The more difficult questions follow from this.  How do we maintain a
co-operatively developed ontology in the face of such dynamic change?
Most ontologies at the moment are centrally controlled - which sits
uncomfortably with the idea of a shared conceptualisation.  But allowing
anyone to change an ontology any way they like is not an appropriate
solution either since there is no guarantee that my view of the world is
shared by anyone else (in fact most of the evidence says it isn't :-)  

What if we divide the world into users (domain experts) and developers
(ontology experts)? You would have users suggesting changes and ontology
developers voting for concepts and relations.  Users may acquire voting
rights on a model similar to the apache development model
http://httpd.apache.org/dev/guidelines.html 

How might this work in practice? Imagine a GUI where users can make
statements about classes and properties.  These might include 
  1. "this is not a valid concept and should be deleted."
  2. "this is not a subclass of that and should be moved to another
parent."
  and a rather different sort of comment  
  3. "To be a tree, a plant must reach a height at maturity of at least
3 meters"

In cases 1&2, the decision is a simple yes or no.  The advantage of
shared development is that if the 'owner' is away on sabbatical for 6
months evolution of the ontology can continue.

Case 3 throws up some interesting issues.  This is an annotation to the
ontology class.   We might decide that if annotations are not voted down
they should become part of class definition - something like this seems
a sensible way of refining the meaning of terms in the ontology.  But
what if someone has already used the class as the object of a triple
before this restriction was placed on the tree class?  Perhaps their
tree is only 2 meters tall.  The annotated class is actually a subclass
of the original class.  But which class is a Tree?  Do we deal with this
by maintaining every version of the ontology and referencing specific
versions, or by allowing the ontology to branch and grow into something
big and rather ugly? Do we need a 'view' mechanism for ontologies -
where the full, ugly ontology is filtered to something more
human-friendly? 

I'm left with the feeling that there is a whole lot of non-trivial stuff
here that isn't being done yet. These thoughts were originally prompted
by a discussion about how to track changes to a single, small ontology -
currently itself a tricky problem since the Jena parser does not
preserve the ordering of nodes in text output, and so running a diff on
two ontologies in cvs is rarely useful.  

If anyone knows of papers that would feed into this discussion I'd
appreciate references - I would guess that there must be work done in
the knowlege management area on this sort of thing.


Chris Catton
BioImage Database Development Manager
Department of Zoology
University of Oxford
OX1 3PS
 
Tel: +44 (0) 1865 281993
email: chris.catton@zoology.oxford.ac.uk
web site: www.bioimage.org
Received on Thursday, 15 January 2004 07:38:42 UTC