RE: Publishing data/ontologies: release early & often, or get them right first?

really interesting and important question!

I believe that with the way ontologies are used in todays applications[1],
changing or removing elements from an ontology will break applications; e.g.
rename foaf:mbox to foaf:mailbox and hundreds of applications will need
changing. It is for this reason that you cannot use agile development
processes for ontologies that are really used in a distributed setting. 

Designing ontologies for this settings is very similar to designing public
APIs, and I recommend everyone to watch Joshua Blochs excellent presentation
on How to Design a good API [2] - a lot of what he says applies to
ontologies as well. In particular he talks about the development process of
the API [ontology] before it becomes public - and here you can and should
use agile processes. He also proposes to code against the API [ontology]
before the API [ontology] is implemented; to create three programs that use
the API [ontology] before it becomes public. 

Having said that, ontologies do make it simple to allow a certain degree of
flexibility - such as adding additional attributes. If the people using the
ontology are aware that a certain part of the ontology is subject to many
changes, it is also relatively easy to create programs that can tolerate
this. For example a few years back we created an Event Ontology [3] that had
a stable core with things such as time, location etc. and a event category
intented to be extended decentrally (e.g. need a SpeedcoreDanceParty
category with an attribute average beats per minute? Just add it!). People
writing software exchanging information with this ontology would know to be
careful to accomodate for unknown subclasses of event category (and could
fall back to treat an event of an unknown category as an generic event with
a full text description and all the attributes and relations from the
reliable core). There are many things wrong with this Event Ontology from
back then, but I believe this pattern of (domain dependent) known and
controlled flexibility is a good one. 

As for ontology evolution, a solution could be to make it possible to
retrieve an executable mapping from any version of an ontology to any other
version - but I'm not aware of any onlogy evolution or versioning solution
that does this (and its not even possible in general). 

cu

[1]: If you like, there is an article discussing whether this way of using
ontologies is the right way in the Jan/Feb issue of IEEE Internet Computing:
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4428341
[2]: http://www.infoq.com/presentations/effective-api-design
[3]: http://www.valentinzacharias.de/papers/ZachariasSibler2004.pdf

--
email: zacharias@fzi.de
phone: +49-721-9654-806
fax  : +49-721-9654-807
http://www.vzach.de/blog

=======================================================================
FZI  Forschungszentrum Informatik an der Universität Karlsruhe (TH)
Haid-und-Neu-Str. 10-14, 76131 Deutschland, http://www.fzi.de
SdbR, Az: 14-0563.1 Regierungspräsidium Karlsruhe
Vorstand: Rüdiger Dillmann, Michael Flor, Jivka Ovtcharova, Rudi Studer
Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus
=======================================================================
-----Original Message-----
From: semantic-web-request@w3.org on behalf of Danny Ayers
Sent: Fri 3/14/2008 11:27 PM
To: semantic-web at W3C
Subject: Publishing data/ontologies: release early & often, or get them
right first?
 

The other day, in conversation with Richard Cyganiak (recorded at
[1]), I paraphrased something timbl had mentioned (recorded at [2]) :
when expressing data as RDF on the Web, it's possible to make a rough
guess at how the information should appear, and over time
incrementally/iteratively improve its alignment with the rest of the
world.

I was upbeat on this (and my paraphrasing probably lost a lot of
timbl's intent) because personally when doing things with RDF I find
it hugely advantageous over a traditional SQL RDMS approach simply
because you can be more agile - not getting your schema right first
time isn't an obstacle to development. But the stuff I play with
(which I do usually put on the web somewhere) isn't likely to develop
a forward chain of dependencies.

But Richard pointed out (no doubt badly paraphrasing again) that the
making of statements when published on the Web brought with it a level
of commitment. I don't think he used these words, but perhaps a kind
of responsibility. The example he gave of where problems can arise was
DBpedia - the modelling, use of terms, is revised every couple of
months or so. Anyone who built an app based on last months vocab might
find the app broken on next month's revision. I think Richard had
properties particularly in mind - though even when Cool URIs are
maintained, might not changes around connections to individuals still
be problematic?

So I was wondering if anyone had any thoughts on how to accomodate
rapid development (or at least being flexible over time) without
repeatedly breaking consuming applications. How deep does our
modelling have to go to avoid this kind of problem? Can the versioning
bits of OWL make a significant difference?

Or to turn it around, as a consumer of Semantic Web data, how do you
avoid breakage due to changes upstream? Should we be prepared to
retract/replace whole named graphs containing ontologies, do we need
to keep provenance for *everything*?
I suspect related - if we have a locally closed world, where do we put
the boundaries?

Cheers,
Danny.

[1]
http://blogs.talis.com/nodalities/2008/03/a_chat_with_richard_cyganiak.php
[2] http://blogs.zdnet.com/semantic-web/?p=105

-- 
http://dannyayers.com
~
http://blogs.talis.com/nodalities/this_weeks_semantic_web/

Received on Saturday, 15 March 2008 10:18:52 UTC