Re: Publishing data/ontologies: release early & often, or get them right first? from Martin Hepp (UIBK) on 2008-03-15 (semantic-web@w3.org from March 2008)

From: Martin Hepp (UIBK) <martin.hepp@uibk.ac.at>
Date: Sat, 15 Mar 2008 16:57:22 +0100
To: Giovanni Tummarello <giovanni.tummarello@deri.org>
CC: Valentin Zacharias <Zacharias@fzi.de>, Danny Ayers <danny.ayers@gmail.com>, semantic-web at W3C <semantic-web@w3c.org>
Message-ID: <47DBF1E2.9050502@uibk.ac.at>
Hi Giovanni:
 > I really see no other solution than coming up with a boring but needed
 > methodology for practically versioning ontologies, and automatically
 > migrating the data.

I personally think that the approach of providing correct versioning for 
ontologies is unfeasible on a Web scale and am much more in favor of (1) 
accepting that applications break from time to time plus (2) providing 
means that allow the community to fix broken links once spotted. A 
simple solution is to encourage new identifiers for modified ontology 
elements with no formal links between predecessor and successor, and 
wait for the "crowds" to create statements of equivalence 
(sameIndividualAs, equivalentClass,...). In some cases it may even be 
okay to have the intended meaning of an already defined element evolve, 
even if that means old data will be invalidated.

I guess many people will say such can never work, but there is always a 
trade-off between the consistency and integrity of a vocabulary on one 
hand and its ability to evolve on the other. Mind that the Web was 
possible only because one sacrificed consistent linking...

Also, while there has been a lot of research on ontology versioning in 
the past eight years or so, the problem is still not solved for 
practical applications (at least, in my humble opinion - would be glad 
to learn of the contrary).

Martin
-----------------------------------------
martin hepp, http://www.heppnetz.de

Giovanni Tummarello wrote:
> I really see no other solution than coming up with a boring but needed
> methodology for practically versioning ontologies, and automatically
> migrating the data.
> 
> when a new version is created, it should be reflected in the URIs of
> probably every property. Mappings to the previous versions should be
> provided in form of sparql construct queries or pointing a something
> like a "semantic web pipe" (have sparql construct as operators ) [1]
> or whatever method.
> 
> Implementing software should then simply perform such steps when
> fetching the data.
> 
> is anybody aware of one such existing dictionary specifically for this
> purpose? if not we should create it right away, accept the extra - but
> inevitable imo - complication and move on.
> 
> Giovanni
> 
> On Sat, Mar 15, 2008 at 10:18 AM, Valentin Zacharias <Zacharias@fzi.de> wrote:
>>
>>  really interesting and important question!
>>
>>  I believe that with the way ontologies are used in todays applications[1],
>>  changing or removing elements from an ontology will break applications; e.g.
>>  rename foaf:mbox to foaf:mailbox and hundreds of applications will need
>>  changing. It is for this reason that you cannot use agile development
>>  processes for ontologies that are really used in a distributed setting.
>>
>>  Designing ontologies for this settings is very similar to designing public
>>  APIs, and I recommend everyone to watch Joshua Blochs excellent presentation
>>  on How to Design a good API [2] - a lot of what he says applies to
>>  ontologies as well. In particular he talks about the development process of
>>  the API [ontology] before it becomes public - and here you can and should
>>  use agile processes. He also proposes to code against the API [ontology]
>>  before the API [ontology] is implemented; to create three programs that use
>>  the API [ontology] before it becomes public.
>>
>>  Having said that, ontologies do make it simple to allow a certain degree of
>>  flexibility - such as adding additional attributes. If the people using the
>>  ontology are aware that a certain part of the ontology is subject to many
>>  changes, it is also relatively easy to create programs that can tolerate
>>  this. For example a few years back we created an Event Ontology [3] that had
>>  a stable core with things such as time, location etc. and a event category
>>  intented to be extended decentrally (e.g. need a SpeedcoreDanceParty
>>  category with an attribute average beats per minute? Just add it!). People
>>  writing software exchanging information with this ontology would know to be
>>  careful to accomodate for unknown subclasses of event category (and could
>>  fall back to treat an event of an unknown category as an generic event with
>>  a full text description and all the attributes and relations from the
>>  reliable core). There are many things wrong with this Event Ontology from
>>  back then, but I believe this pattern of (domain dependent) known and
>>  controlled flexibility is a good one.
>>
>>  As for ontology evolution, a solution could be to make it possible to
>>  retrieve an executable mapping from any version of an ontology to any other
>>  version - but I'm not aware of any onlogy evolution or versioning solution
>>  that does this (and its not even possible in general).
>>
>>  cu
>>
>>  [1]: If you like, there is an article discussing whether this way of using
>>  ontologies is the right way in the Jan/Feb issue of IEEE Internet Computing:
>>  http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4428341
>>  [2]: http://www.infoq.com/presentations/effective-api-design
>>  [3]: http://www.valentinzacharias.de/papers/ZachariasSibler2004.pdf
>>
>>  --
>>  email: zacharias@fzi.de
>>  phone: +49-721-9654-806
>>  fax  : +49-721-9654-807
>>  http://www.vzach.de/blog
>>
>>  =======================================================================
>>  FZI  Forschungszentrum Informatik an der Universität Karlsruhe (TH)
>>  Haid-und-Neu-Str. 10-14, 76131 Deutschland, http://www.fzi.de
>>  SdbR, Az: 14-0563.1 Regierungspräsidium Karlsruhe
>>  Vorstand: Rüdiger Dillmann, Michael Flor, Jivka Ovtcharova, Rudi Studer
>>  Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus
>>  =======================================================================
>>
>>
>> -----Original Message-----
>>  From: semantic-web-request@w3.org on behalf of Danny Ayers
>>  Sent: Fri 3/14/2008 11:27 PM
>>  To: semantic-web at W3C
>>  Subject: Publishing data/ontologies: release early & often, or get them
>>  right first?
>>
>>
>>  The other day, in conversation with Richard Cyganiak (recorded at
>>  [1]), I paraphrased something timbl had mentioned (recorded at [2]) :
>>  when expressing data as RDF on the Web, it's possible to make a rough
>>  guess at how the information should appear, and over time
>>  incrementally/iteratively improve its alignment with the rest of the
>>  world.
>>
>>  I was upbeat on this (and my paraphrasing probably lost a lot of
>>  timbl's intent) because personally when doing things with RDF I find
>>  it hugely advantageous over a traditional SQL RDMS approach simply
>>  because you can be more agile - not getting your schema right first
>>  time isn't an obstacle to development. But the stuff I play with
>>  (which I do usually put on the web somewhere) isn't likely to develop
>>  a forward chain of dependencies.
>>
>>  But Richard pointed out (no doubt badly paraphrasing again) that the
>>  making of statements when published on the Web brought with it a level
>>  of commitment. I don't think he used these words, but perhaps a kind
>>  of responsibility. The example he gave of where problems can arise was
>>  DBpedia - the modelling, use of terms, is revised every couple of
>>  months or so. Anyone who built an app based on last months vocab might
>>  find the app broken on next month's revision. I think Richard had
>>  properties particularly in mind - though even when Cool URIs are
>>  maintained, might not changes around connections to individuals still
>>  be problematic?
>>
>>  So I was wondering if anyone had any thoughts on how to accomodate
>>  rapid development (or at least being flexible over time) without
>>  repeatedly breaking consuming applications. How deep does our
>>  modelling have to go to avoid this kind of problem? Can the versioning
>>  bits of OWL make a significant difference?
>>
>>  Or to turn it around, as a consumer of Semantic Web data, how do you
>>  avoid breakage due to changes upstream? Should we be prepared to
>>  retract/replace whole named graphs containing ontologies, do we need
>>  to keep provenance for *everything*?
>>  I suspect related - if we have a locally closed world, where do we put
>>  the boundaries?
>>
>>  Cheers,
>>  Danny.
>>
>>  [1]
>>  http://blogs.talis.com/nodalities/2008/03/a_chat_with_richard_cyganiak.php
>>  [2] http://blogs.zdnet.com/semantic-web/?p=105
>>
>>  --
>>  http://dannyayers.com
>>  ~
>>  http://blogs.talis.com/nodalities/this_weeks_semantic_web/
>>
>>
>>
>>
> 
>
Received on Saturday, 15 March 2008 15:58:41 UTC