Re: [BioRDF] Meeting Notes Feb 27, 2006 from Tom Stambaugh on 2006-03-01 (public-semweb-lifesci@w3.org from March 2006)

From: Tom Stambaugh <tms@stambaugh-inc.com>
Date: Wed, 1 Mar 2006 08:55:12 -0500
To: <public-semweb-lifesci@w3.org>
Message-ID: <002501c63d37$c2c05cc0$0200a8c0@TMSMAIN>
I think Eric correctly observes
> ... most (bioinformatics) databases have data
> models that are such that nothing but a full blown programming language
> will do. (And a lot of manual clean up work may be required in addition.)

I fear that the discussion about XSLT specifically and conversion in general 
misses a fundamental aspect of how a "full blown programming language" is 
used in this context -- namely, that the RDF/OWL representations are used to 
*generate* models in various languages. I suspect that most of us won't 
write tools that read and write RDF/OWL to manipulate this data. Instead, I 
suspect that programmers will build "harnesses" that accept an RDF/OWL 
representation and emit a dynamic model in a specific language -- Java, 
Python, Javascript, whatever -- and scientists will then use the resulting 
program to, for example, access various databases.

In my view, the achilles heel of XSLT and any similar *query* tool is that 
these tools are not designed to handle the dynamics of the information being 
modeled -- they have at best a very limited execution or process model. This 
is not to say that such tools are useless for us, it is instead to observe 
that they belong in the quiver of arrows that we use to analyze the 
*results* of running a model generated in a "full blown programming 
language" from an RDF/OWL representation.

Thus, while I agree that XQuery might "abolish the need to commit to a 
certain programming language", in my view it does so NOT because we'll all 
start writing our models in XQuery, but because we'll be able to write our 
models in whatever language we choose, relying on semantic web technology --  
including XQuery -- to ensure their completeness, reliability, and accuracy. 
Sure, some of us will choose to read and write XQuery; I just don't see this 
as being particularly widespread. After all, some of us choose to read and 
write assembler.

> I agree with that, too. This might be a major problem for the transition 
> from non-RDF- to RDF- based
> bioinformatics. People will not switch to RDF from one day to the other, 
> so you need a transitional
> period where you offer your data both in non-RDF and in RDF format (like 
> Reactome does, for example).
> The problem is that most databases are growing steadily, and you have to 
> keep both versions updated.
> This is severly complicated because of the inevitable need for manual 
> clean-up work that has to be done
> prior to the conversion to RDF.

I don't see RDF as an *alternative* to a database. It might be an 
alternative serialization of a database, but I'm not sure about even that. 
Won't we use RDF/OWL to emit SQL that we'll then use to query our databases? 
I'm thus not sure that we ever "switch to RDF" -- don't we instead begin 
using RDF to qualify, validate, and optimize our database representations?

It seems to me that RDF helps us describe and model the structure of our 
data. In my view, we'll then *use* this RDF-derived description and model to 
build relational databases that hold said data. In this worldview, the 
existence of the RDF description then helps us keep the dynamic models --  
written in Java, Python or whatever -- in synch with the underlying 
relational descriptions, kept in relational DB's like MySql and Oracle.

Perhaps I'm the one who's fundamentally mistaken about all this, though.

Thanks,
Tom
Received on Wednesday, 1 March 2006 13:55:31 UTC