Re: I read a challenge. was, Re: [gofriends] GO ontology in OWL format from Alan Ruttenberg on 2007-09-18 (public-semweb-lifesci@w3.org from September 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Tue, 18 Sep 2007 04:40:10 -0400
To: Chris Mungall <cjm@fruitfly.org>
Cc: public-semweb-lifesci@w3.org, markov@mpiz-koeln.mpg.de, groscurt@mpiz-koeln.mpg.de, schoof@mpiz-koeln.mpg.de
Message-Id: <745E20D1-C715-463F-A28F-7180670E429C@gmail.com>

Here's my way of looking at this.

The information from which the classrelations graph was derived was  
the original OWL. We'd like a world in which we can query the OWL,  
with reasoner support, but we don't have that yet. We will at some  
point.

The most common way of doing inference in triple store currently is  
to add all inferred triples. More generally, in order to do efficient  
query in such a system one will need some combination of a  
precomputed data structure of some sort and then some procedure to  
use that + what's in the triple store to get the answer to a query.

For a tiny bit of inference that we needed for these queries, we've  
effectively done our own version of a data structure, these class/ 
class relations, and then used abox queries on them to answer a tbox  
query. You can see the effective t-box query in some of the  
references I pointed to in the previous mail, the ones that we asked  
pellet in order to get the answer to save in the classrelations graph.

So it is a trick, but in a  similar way that other query engines will  
use tricks to effectively answer queries. We could have hidden the  
trick, in the sense that we could have had a layer in front of the  
tbox query that rewrote it to take advantage of these cached results,  
and then we would be behaving in a similar way to other systems even  
more. We should probably do that, if only because then client code  
will be insensitive to the specifics of how we've made the query  
efficient.

The particular method used to make these queries fast is a short term  
strategy. The long term strategy is to keep the main source in OWL so  
that the semantics are as clear as they can be and, so that it is a  
test and a target for OWL system implementors, and to test and use  
new query implementations as they come up. In the mean time we may  
get smarter about how may queries we can answer (not many yet), and  
should put in that layer so that you are asking the tbox queries  
(though it may respond that it can't answer some of them).

Aside from the fact that we can only answer a limited set of queries  
with our current optimization, what other limitations do you see?

-Alan

On Sep 14, 2007, at 11:11 AM, Chris Mungall wrote:

> Ah yes, but let me just join the dots for everyone here - you are  
> treating the classes as instances here, and relations at the class  
> level become simple triples rather than restrictions. This means we  
> can ignore OWL altogether and use RDFS semantics, which suits  
> existing triplestores nicely. Mathias recommended as much on a  
> separate thread.
>
> Is this a trick or a long term strategy? I don't know. I can  
> certainly see some problems with the approach.

Received on Tuesday, 18 September 2007 08:40:20 UTC