Re: Publishing data/ontologies: release early & often, or get them right first?

Dear Giovanni:
Thanks - comments inline

Giovanni Tummarello wrote:
>>  ontologies is unfeasible on a Web scale and am much more in favor of (1)
>>  accepting that applications break from time to time plus (2) providing...
>>     
> ...
> Good luck explaining this to any enterprise IT manager evaluating
> Semantic Web technologies..
>   
I am well aware that many practitioners dislike uncontrolled change that 
may make systems break. However, they rarely see that striving for 
consistency hinders timely
evolution of vocabularies, which can be an even worse evil. I have done 
quite a lot of quantitative analysis on how insufficient the domain 
coverage of centrally administered vocabularies is, see e.g. [1].
If a lightweight versioning system can be established, fine - but I am 
deeply convinced that heavyweight control of the evolution of 
vocabularies will never work, since domains are always in a state of flux.
And many computer scientists assume that additional control for 
consistency can be a free lunch. But it isn't, it also has poison in it.

>>  possible only because one sacrificed consistent linking...
>>
>>     
>
> this example does not relate. On the web you cant be sure about links
> to external resources but you better be sure you're linking to the
> right pages inside your own site, and make sure you're consistent.
>
> Here were' talking about simply linking to your older version of the
> ontology, there is no reason why this should be possible or would be
> inconvenient?
>
>   
In many cases, it will be anything from time-consuming, difficult to 
impossible to specify the exact semantic relationship between two 
versions of the "same" ontology element, unless something very generic 
like "isSuccessorOf" is sufficient.
Take the class "TV set" - the term meant something different in the 
1950s as it does mean today: Something that classified as an instance of 
the concept in the notion of that time may no longer subsume under our 
current notion of "TV set".

>   
>>  Also, while there has been a lot of research on ontology versioning in
>>  the past eight years or so, the problem is still not solved for
>>  practical applications (at least, in my humble opinion - would be glad
>>  to learn of the contrary).
>>     
>
> Thus my hint at something as pragmatic as Pipes.. [1] (forgot to put
> the link). If the operator for your mapping is not there  yet just add
> it yourself.
>
> Giovanni
>
>   
Then we are partly in agreement: My main suggestion was:
1. Allow for fast evolution in vocabularies, rather accept that the 
Semantic Web breaks every now and then.
2. Accept it breaking in two ways: lack of recall (objects missing in 
the result set) and lack of precision (wrong objects  in a result set).
3. If in doubt, rather allow for lack of recall by encouraging new 
identifiers for modified conceptual elements and hope for the community 
to fix the missing link soon.

Of course, I am not saying one should willingly put data integrity at 
risk if this can be avoided by simple best practices (but surprisingly 
even the most basic best practices for Web consistency like Cool URIs 
are disregarded by many "IT enterprise managers" ;-)

Martin

[1] http://www.heppnetz.de/files/ConceptualDynamics-SEBIS-TR2007-06-26.pdf
> [1] http://pipes.deri.org
>
>
>   
>>  Martin
>>  -----------------------------------------
>>  martin hepp, http://www.heppnetz.de
>>
>>
>>
>>  Giovanni Tummarello wrote:
>>  > I really see no other solution than coming up with a boring but needed
>>  > methodology for practically versioning ontologies, and automatically
>>  > migrating the data.
>>  >
>>  > when a new version is created, it should be reflected in the URIs of
>>  > probably every property. Mappings to the previous versions should be
>>  > provided in form of sparql construct queries or pointing a something
>>  > like a "semantic web pipe" (have sparql construct as operators ) [1]
>>  > or whatever method.
>>  >
>>  > Implementing software should then simply perform such steps when
>>  > fetching the data.
>>  >
>>  > is anybody aware of one such existing dictionary specifically for this
>>  > purpose? if not we should create it right away, accept the extra - but
>>  > inevitable imo - complication and move on.
>>  >
>>  > Giovanni
>>  >
>>  > On Sat, Mar 15, 2008 at 10:18 AM, Valentin Zacharias <Zacharias@fzi.de> wrote:
>>  >>
>>  >>  really interesting and important question!
>>  >>
>>  >>  I believe that with the way ontologies are used in todays applications[1],
>>  >>  changing or removing elements from an ontology will break applications; e.g.
>>  >>  rename foaf:mbox to foaf:mailbox and hundreds of applications will need
>>  >>  changing. It is for this reason that you cannot use agile development
>>  >>  processes for ontologies that are really used in a distributed setting.
>>  >>
>>  >>  Designing ontologies for this settings is very similar to designing public
>>  >>  APIs, and I recommend everyone to watch Joshua Blochs excellent presentation
>>  >>  on How to Design a good API [2] - a lot of what he says applies to
>>  >>  ontologies as well. In particular he talks about the development process of
>>  >>  the API [ontology] before it becomes public - and here you can and should
>>  >>  use agile processes. He also proposes to code against the API [ontology]
>>  >>  before the API [ontology] is implemented; to create three programs that use
>>  >>  the API [ontology] before it becomes public.
>>  >>
>>  >>  Having said that, ontologies do make it simple to allow a certain degree of
>>  >>  flexibility - such as adding additional attributes. If the people using the
>>  >>  ontology are aware that a certain part of the ontology is subject to many
>>  >>  changes, it is also relatively easy to create programs that can tolerate
>>  >>  this. For example a few years back we created an Event Ontology [3] that had
>>  >>  a stable core with things such as time, location etc. and a event category
>>  >>  intented to be extended decentrally (e.g. need a SpeedcoreDanceParty
>>  >>  category with an attribute average beats per minute? Just add it!). People
>>  >>  writing software exchanging information with this ontology would know to be
>>  >>  careful to accomodate for unknown subclasses of event category (and could
>>  >>  fall back to treat an event of an unknown category as an generic event with
>>  >>  a full text description and all the attributes and relations from the
>>  >>  reliable core). There are many things wrong with this Event Ontology from
>>  >>  back then, but I believe this pattern of (domain dependent) known and
>>  >>  controlled flexibility is a good one.
>>  >>
>>  >>  As for ontology evolution, a solution could be to make it possible to
>>  >>  retrieve an executable mapping from any version of an ontology to any other
>>  >>  version - but I'm not aware of any onlogy evolution or versioning solution
>>  >>  that does this (and its not even possible in general).
>>  >>
>>  >>  cu
>>  >>
>>  >>  [1]: If you like, there is an article discussing whether this way of using
>>  >>  ontologies is the right way in the Jan/Feb issue of IEEE Internet Computing:
>>  >>  http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4428341
>>  >>  [2]: http://www.infoq.com/presentations/effective-api-design
>>  >>  [3]: http://www.valentinzacharias.de/papers/ZachariasSibler2004.pdf
>>  >>
>>  >>  --
>>  >>  email: zacharias@fzi.de
>>  >>  phone: +49-721-9654-806
>>  >>  fax  : +49-721-9654-807
>>  >>  http://www.vzach.de/blog
>>  >>
>>  >>  =======================================================================
>>  >>  FZI  Forschungszentrum Informatik an der Universität Karlsruhe (TH)
>>  >>  Haid-und-Neu-Str. 10-14, 76131 Deutschland, http://www.fzi.de
>>  >>  SdbR, Az: 14-0563.1 Regierungspräsidium Karlsruhe
>>  >>  Vorstand: Rüdiger Dillmann, Michael Flor, Jivka Ovtcharova, Rudi Studer
>>  >>  Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus
>>  >>  =======================================================================
>>  >>
>>  >>
>>  >> -----Original Message-----
>>  >>  From: semantic-web-request@w3.org on behalf of Danny Ayers
>>  >>  Sent: Fri 3/14/2008 11:27 PM
>>  >>  To: semantic-web at W3C
>>  >>  Subject: Publishing data/ontologies: release early & often, or get them
>>  >>  right first?
>>  >>
>>  >>
>>  >>  The other day, in conversation with Richard Cyganiak (recorded at
>>  >>  [1]), I paraphrased something timbl had mentioned (recorded at [2]) :
>>  >>  when expressing data as RDF on the Web, it's possible to make a rough
>>  >>  guess at how the information should appear, and over time
>>  >>  incrementally/iteratively improve its alignment with the rest of the
>>  >>  world.
>>  >>
>>  >>  I was upbeat on this (and my paraphrasing probably lost a lot of
>>  >>  timbl's intent) because personally when doing things with RDF I find
>>  >>  it hugely advantageous over a traditional SQL RDMS approach simply
>>  >>  because you can be more agile - not getting your schema right first
>>  >>  time isn't an obstacle to development. But the stuff I play with
>>  >>  (which I do usually put on the web somewhere) isn't likely to develop
>>  >>  a forward chain of dependencies.
>>  >>
>>  >>  But Richard pointed out (no doubt badly paraphrasing again) that the
>>  >>  making of statements when published on the Web brought with it a level
>>  >>  of commitment. I don't think he used these words, but perhaps a kind
>>  >>  of responsibility. The example he gave of where problems can arise was
>>  >>  DBpedia - the modelling, use of terms, is revised every couple of
>>  >>  months or so. Anyone who built an app based on last months vocab might
>>  >>  find the app broken on next month's revision. I think Richard had
>>  >>  properties particularly in mind - though even when Cool URIs are
>>  >>  maintained, might not changes around connections to individuals still
>>  >>  be problematic?
>>  >>
>>  >>  So I was wondering if anyone had any thoughts on how to accomodate
>>  >>  rapid development (or at least being flexible over time) without
>>  >>  repeatedly breaking consuming applications. How deep does our
>>  >>  modelling have to go to avoid this kind of problem? Can the versioning
>>  >>  bits of OWL make a significant difference?
>>  >>
>>  >>  Or to turn it around, as a consumer of Semantic Web data, how do you
>>  >>  avoid breakage due to changes upstream? Should we be prepared to
>>  >>  retract/replace whole named graphs containing ontologies, do we need
>>  >>  to keep provenance for *everything*?
>>  >>  I suspect related - if we have a locally closed world, where do we put
>>  >>  the boundaries?
>>  >>
>>  >>  Cheers,
>>  >>  Danny.
>>  >>
>>  >>  [1]
>>  >>  http://blogs.talis.com/nodalities/2008/03/a_chat_with_richard_cyganiak.php
>>  >>  [2] http://blogs.zdnet.com/semantic-web/?p=105
>>  >>
>>  >>  --
>>  >>  http://dannyayers.com
>>  >>  ~
>>  >>  http://blogs.talis.com/nodalities/this_weeks_semantic_web/
>>  >>
>>  >>
>>  >>
>>  >>
>>  >
>>  >
>>
>>     
>
>   

Received on Saturday, 15 March 2008 16:31:01 UTC