Re: Publishing data/ontologies: release early & often, or get them right first?

On 15/03/2008, Danny Ayers <danny.ayers@gmail.com> wrote:
>  I was upbeat on this (and my paraphrasing probably lost a lot of
>  timbl's intent) because personally when doing things with RDF I find
>  it hugely advantageous over a traditional SQL RDMS approach simply
>  because you can be more agile - not getting your schema right first
>  time isn't an obstacle to development. But the stuff I play with
>  (which I do usually put on the web somewhere) isn't likely to develop
>  a forward chain of dependencies.
>

Good questions - something I wish I had more time to respond to.  I
really like the aspect of looking at the data and working out how best
to structure it rather than the other way around.

>  But Richard pointed out (no doubt badly paraphrasing again) that the
>  making of statements when published on the Web brought with it a level
>  of commitment. I don't think he used these words, but perhaps a kind
>  of responsibility. The example he gave of where problems can arise was
>  DBpedia - the modelling, use of terms, is revised every couple of
>  months or so. Anyone who built an app based on last months vocab might
>  find the app broken on next month's revision. I think Richard had
>  properties particularly in mind - though even when Cool URIs are
>  maintained, might not changes around connections to individuals still
>  be problematic?
>

So anyone who has been on an agile project has probably seen the
database schema change hourly if not more quickly.  If the Semantic
Web gets close to that then migration is obviously a necessity - and
something like Ruby on Rails (ActiveRecord migration) is a good
example about how that might be achieved - you don't change your
ontology unless you provide a migration from one version to the next.

It's not really a different problem than what people are already
trying to solve - integrating various different ontologies at one time
is the same as using the same ontology over time.

>  So I was wondering if anyone had any thoughts on how to accomodate
>  rapid development (or at least being flexible over time) without
>  repeatedly breaking consuming applications. How deep does our
>  modelling have to go to avoid this kind of problem? Can the versioning
>  bits of OWL make a significant difference?
>

Rapid development relies on knowing when it's broken and having a
migration path (which includes versioning).  Ontologies seem to offer
the ability that we could do a diff between them and predict where you
application would break - that would be cool and much better than
current technologies.

The modelling difference I've seen is to make ontologies small and
refer to as much existing good work and only make changes where you
have to.

>  Or to turn it around, as a consumer of Semantic Web data, how do you
>  avoid breakage due to changes upstream? Should we be prepared to
>  retract/replace whole named graphs containing ontologies, do we need
>  to keep provenance for *everything*?
>  I suspect related - if we have a locally closed world, where do we put
>  the boundaries?
>

So you can prevent breakage by setting things in stone based on time.
There's absolutely nothing wrong with that - a lot of analysis just
has to take point in time and start working.

I think the right level of granularity is to have provenance tracked
at a level much smaller than a graph but bigger than just one
statement.

Received on Tuesday, 18 March 2008 06:04:46 UTC