- From: Alan Ruttenberg <alanruttenberg@gmail.com>
- Date: Wed, 14 Mar 2007 21:37:58 -0400
- To: "Kashyap, Vipul" <VKASHYAP1@PARTNERS.ORG>
- Cc: "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>
On Mar 14, 2007, at 4:44 PM, Kashyap, Vipul wrote: > Alan, > > You have proposed some modeling suggestions and of course alignment > with the OBO > relations ontology. > > Other than expressing the semantics of these classes precisely, it > will be great > if you and someone in this group could identify the potential impact > of these modeling choices on: > - Enabling different types of integration that were not feasible > before I think at the moment I am more concerned about data integration than novel inferences, although I do expect a number of inference demonstrations. I view the comments I'm providing as a way to deal with some integration problems before they arise, but I think it will be better shown once we start looking at specific queries. The semantics, however, are somewhat more important, particularly such things as clearly defining classes, distinguishing part of, is a, and derives from, etc. Whenever they are mixed up we will get some wrong answers when we questions using these relations. Put another way, the goal might be stated as wanting to get both *all* available answers to our questions, and *only* correct answers to our questions, and both the above contribute to achieving that goal. Regarding this sort of integration not being feasible before, I'd stay away from that argument. I do hope to show that, as a matter of fact, this sort of integration is rarely done, that it is possible to do better with an acceptable level of effort, and that both the semantic web tools and ethos help make it easier and more fruitful. A small example of this was illustrated yesterday in the discussion about dart grid. We were looking at mapping a column that recorded gender as a text field with either the character "M" or "F". Now typically, this is a distinction we wish to make in our ontologies, and we would generally have a class (ideally the same class across ontologies) to capture this distinction. In a standard object- relational model, one could make M and F instead "object" by having a second table, and a foreign key to that table to record the gender. But no one does that because it seems "overkill" - the queries are more painful, the computational overhead is more, etc. But RDF or OWL this kind of thing is (or should be) common practice, we incur no penalty, and having it in this form makes it more straightforward to integrate across independently constructed ontologies - sameas, subclass, equivalent class all provide standard ways of making the connection. Compare this to the effort to merge two relational schemas, where gender columns are used in various tables, named differently, and where one database uses "M" and "F" and the other uses "Male" and "Female". > - Enabling different types of inferences which would enable further > integration > not possible before. I don't think I have said, or want to say, that integration before was not possible. However, I note that in fact it is has not been done in a usable way for many of the resources we realistically would want to use to ask questions about our scientific use case. There are a number of reasons for this, some of which our use of semantic web technologies speak to. For example, that there is a shared standard and working tools based on it means that efforts to integrate can be built on by others, which offers more bang for your buck, so to speak, an important consideration when deciding to devote the not insubstantial effort necessary to put resources in a form that makes it possible to effectively integrate them. Technically, the fact that there is less pain involved with schema extension and evolution when using OWL/RDF then when using traditional RDMS table oriented schema reduces the effort to integrate a large number of sources. > Alternatively, for the purpose of the demo, one could just do a > shallow alignment so that different data sets can be integrated. We will do what's necessary. But at this point, since people have volunteered to own the translation of certain data sources, and since one of our goals is to explore and learn, I've been trying to get us further than we would be with this approach. There have been previous demonstrations of this sort of shallow alignment, and from the point of view of showing something novel, it would be nice to go beyond that. Given what's been done so far, and the responses I've seen to the analysis and suggestions people have been offering, I'm feeling optimistic. Best, Alan
Received on Thursday, 15 March 2007 01:37:52 UTC