Re: NeuronDB RDF and OWL from Alan Ruttenberg on 2007-03-15 (public-semweb-lifesci@w3.org from March 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Wed, 14 Mar 2007 21:37:58 -0400
To: "Kashyap, Vipul" <VKASHYAP1@PARTNERS.ORG>
Cc: "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>
Message-Id: <11F5FCC7-9D65-4BA4-8249-E397449DA762@gmail.com>
On Mar 14, 2007, at 4:44 PM, Kashyap, Vipul wrote:

> Alan,
>
> You have proposed some modeling suggestions and of course alignment  
> with the OBO
> relations ontology.
>
> Other than expressing the semantics of these classes precisely, it  
> will be great
> if you and someone in this group could identify the potential impact
> of these modeling choices on:
> - Enabling different types of integration that were not feasible  
> before

I think at the moment I am more concerned about data integration than  
novel inferences, although I do expect a number of inference  
demonstrations. I view the comments I'm providing as a way to deal  
with some integration problems before they arise, but I think it will  
be better shown once we start looking at specific queries.

The semantics, however, are somewhat more important, particularly  
such things as clearly defining classes, distinguishing part of, is  
a, and derives from, etc.  Whenever they are mixed up we will get  
some wrong answers when we questions using these relations.

Put another way, the goal might be stated as wanting to get both  
*all* available answers to our questions, and *only* correct answers  
to our questions, and both the above contribute to achieving that goal.

Regarding this sort of integration not being feasible before, I'd  
stay away from that argument. I do hope to show that, as a matter of  
fact, this sort of integration is rarely done, that it is possible to  
do better with an acceptable level of effort, and that both the  
semantic web tools and ethos help make it easier and more fruitful.

A small example of this was illustrated yesterday in the discussion  
about dart grid. We were looking at mapping a column that recorded  
gender as a text field with either the character "M" or "F". Now  
typically, this is a distinction we wish to make in our ontologies,  
and we would generally have a class (ideally the same class across  
ontologies) to capture this distinction. In a standard object- 
relational model, one could make M and F instead "object" by having a  
second table, and a foreign key to that table to record the gender.  
But no one does that because it seems "overkill" - the queries are  
more painful, the computational overhead is more, etc. But RDF or OWL  
this kind of thing is (or should be) common practice, we incur no  
penalty, and having it in this form makes it more straightforward to  
integrate across independently constructed ontologies - sameas,  
subclass, equivalent class all provide standard ways of making the  
connection. Compare this to the effort to merge two relational  
schemas, where gender columns are used in various tables, named  
differently, and where one database uses "M" and "F" and the other  
uses "Male" and "Female".

> - Enabling different types of inferences which would enable further  
> integration
> not possible before.

I don't think I have said, or want to say, that integration before  
was not possible. However, I note that in fact it is has not been  
done in a usable way for many of the resources we realistically would  
want to use to ask questions about our scientific use case. There are  
a number of reasons for this, some of which our use of semantic web  
technologies speak to. For example, that there is a shared standard  
and working tools based on it means that efforts to integrate can be  
built on by others, which offers more bang for your buck, so to  
speak, an important consideration when deciding to devote the not  
insubstantial effort necessary to put resources in a form that makes  
it possible to effectively integrate them. Technically, the fact that  
there is less pain involved with schema extension and evolution when  
using OWL/RDF then when using traditional RDMS table oriented schema  
reduces the effort to integrate a large number of sources.

> Alternatively, for the purpose of the demo, one could just do a  
> shallow alignment so that different data sets can be integrated.

We will do what's necessary. But at this point, since people have  
volunteered to own the translation of certain data sources, and since  
one of our goals is to explore and learn, I've been trying to get us  
further than we would be with this approach. There have been previous  
demonstrations of this sort of shallow alignment, and from the point  
of view of showing something novel, it would be nice to go beyond  
that. Given what's been done so far, and the responses I've seen to  
the analysis and suggestions people have been offering, I'm feeling  
optimistic.

Best,
Alan
Received on Thursday, 15 March 2007 01:37:52 UTC