Re: Publishing data/ontologies: release early & often, or get them right first?


like natural languages, it's essential that boundaries are highly  

It seems that the possibility for getting things wrong is a necessity.
fortunately much can be considered rubbish and built upon.
left for future archaeologists to unearth.
or like the pyramids, buried with great care.
lest they bury us.

recently I've been developing yet another pre-alpha visual search  
engine, for svg icons.
ie search for images by picture or text description:

The idea being that anyone can copy and paste images into a document,  
which is then a category.
they then upload the uri for their category, to the db.
then anyone searching for the category or one of it's images, may  
find this resource....

If the vision of distributed description is to guide the development  
of SVG graphics, specifications need to be designed to ensure  
resources are published in a form which enables them to be easily  

Specifications that include naive end users in a RAD process are  
likely to have reach and influence.
We may reasonably hope they may help us to meet the somewhat  
demanding requirement to repurpose.

There is no right first!


Jonathan Chetwynd

+44 (0) 20 7978 1764

On 14 Mar 2008, at 22:27, Danny Ayers wrote:

The other day, in conversation with Richard Cyganiak (recorded at
[1]), I paraphrased something timbl had mentioned (recorded at [2]) :
when expressing data as RDF on the Web, it's possible to make a rough
guess at how the information should appear, and over time
incrementally/iteratively improve its alignment with the rest of the

I was upbeat on this (and my paraphrasing probably lost a lot of
timbl's intent) because personally when doing things with RDF I find
it hugely advantageous over a traditional SQL RDMS approach simply
because you can be more agile - not getting your schema right first
time isn't an obstacle to development. But the stuff I play with
(which I do usually put on the web somewhere) isn't likely to develop
a forward chain of dependencies.

But Richard pointed out (no doubt badly paraphrasing again) that the
making of statements when published on the Web brought with it a level
of commitment. I don't think he used these words, but perhaps a kind
of responsibility. The example he gave of where problems can arise was
DBpedia - the modelling, use of terms, is revised every couple of
months or so. Anyone who built an app based on last months vocab might
find the app broken on next month's revision. I think Richard had
properties particularly in mind - though even when Cool URIs are
maintained, might not changes around connections to individuals still
be problematic?

So I was wondering if anyone had any thoughts on how to accomodate
rapid development (or at least being flexible over time) without
repeatedly breaking consuming applications. How deep does our
modelling have to go to avoid this kind of problem? Can the versioning
bits of OWL make a significant difference?

Or to turn it around, as a consumer of Semantic Web data, how do you
avoid breakage due to changes upstream? Should we be prepared to
retract/replace whole named graphs containing ontologies, do we need
to keep provenance for *everything*?
I suspect related - if we have a locally closed world, where do we put
the boundaries?




Received on Saturday, 15 March 2008 07:41:58 UTC