Re: Demo SPARQL notes from Alan Ruttenberg on 2007-04-17 (public-semweb-lifesci@w3.org from April 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Tue, 17 Apr 2007 10:53:40 -0400
To: samwald@gmx.at
Cc: public-semweb-lifesci@w3.org
Message-Id: <4820B895-21AB-416E-A90C-61B65196FD9C@gmail.com>
On Apr 17, 2007, at 10:08 AM, samwald@gmx.at wrote:

> Hi Alan,
>
> I am quite interested in your approach of making OWL property  
> restrictions accessible to Sparql queries.
> In general, our demo contains several ontologies that are mainly  
> based on classes and OWL property restrictions. To query the  
> information held in these ontologies directly with Sparql would be  
> quite complicated, as the OWL property restrictions produce RDF  
> graphs that are quite convoluted and hard to query.

They are not that hard to query as asserted - the page is a proof by  
demonstration - they are just verbose and have detail that is better  
managed by a computer program than a person. A reasonable macro  
system within SPARQL could encapsulate these patterns and make them  
easier to write. What is hard is doing queries that return sound and  
complete answers. But this is not a function of the expression of the  
restrictions, rather finding the implications of what the  
restrictions and class definitions say.

> One way of 'querying' such ontologies would be to define a new  
> 'query class' with necessary & sufficient conditions that are the  
> parameters of the query, run a reasoner, and see which classes are  
> classified as subclasses of the 'query class'.

Certainly, for our demo, where there are nontrivial subsumptions that  
can be computed by a reasoner, we should attempt to augment the store  
with these subsumptions (a set of x subclassof y triples). I'm  
working towards making it easy to add additions like this. For the  
moment we will only be able to do our best to make sure that these  
will lead to  sound query results. They certainly won't be complete.

> For example, our current demo script contains the research question  
> "How might beta amyloid alter LTP in CA1 neurons?". This can be  
> answered by using the BrainPharm dataset and setting up a class  
> that has CA1 neurons and beta amyloid as parts and/or participants.  
> I did with Protege and Pellet, and it works.

Could you put up instructions on how to reproduce this on a page on  
the wiki and link it to mine? I'll then look at replicating it in our  
store.

> However, I do not know how this approach could be implemented in a  
> real web environment with the current tools that we have. How would  
> the ‘query class’ be produced? How would it be added to the RDF  
> store? How should this be implemented with decent scalability?  
> These are all questions that should be obvious, but yet they have  
> been ignored by many influental ontology / OWL developers so far. I  
> think that we will not have a practical answer to these questions  
> in the immediate future.

In general, I think we are aiming towards a time where we will have  
stores which are capable of doing full inference. That is why I have  
emphasized using the full expressivity of OWL in trying best express  
our intentions rather than adjusting, at the encoding level, our  
statements to match our currently lower expectations of what the  
tools can do.

There are promising developments in the direction of increased levels  
of inference -  Instance Store II, DL-Lite, A-Box summarization,  
ideas based on the new reasoner Hermit etc. Many of these approaches  
are based, in one way or another, on adding various kinds of  
precomputed information and data structures to the store in order  
that less reasoning need be done at query time. The naive approach to  
this - adding entailed triples - is reasonably successful and is  
demonstrated in systems such as Oracle's RDF store, Sesame, OWLIM,  
and to a currently lesser extent, in Virtuoso (more about that will  
be put on the aforementioned page, hopefully later today).

So the general idea of adding triples such as I have done is not  
particularly novel, and one would expect it to scale reasonably,  
based on the existing systems that I cite.

> Therefore, I think that Alan’s proposal of adding direct relations  
> between class entities is the best solution we have at the moment.  
> Of course, it would be nice to find a solution that would make it  
> possible to keep the resulting ontologies valid OWL DL. Maybe we  
> should set up  annotation properties that are derived from the  
> original properties (e.g. by concatenating ‘_class_property’ to the  
> end of the URIs of the original properties) and use these?

There are different ways to think about this. As I note, what I have  
done is considered valid OWL 1.1 because of punning. In a more  
polished system we could also consider the added triples to be  
internal information known to the query compiler, which could rewrite  
more complicated queries to uses these added triples. If we wanted to  
keep the full set OWL-DL, then your approach of using shadow  
annotation properties for these relations would lead to correct OWL- 
DL, though we would still need to be careful, as we will be now, to  
be cognizant of what to expect as the answers to each kind of query,  
since although they are OWL-DL they don't have their real semantics  
encoded.

So to summarize - our approach now is to try to not compromise at the  
representation end, but to compromise at the query side, by using  
various tricks to enable a set of queries which we know in advance we  
want to support, aware that other reasonable queries will not be able  
to be answered. I've proposed this approach for a number of reasons,  
but importantly because
  - the representation side has been neglected, IMO,
  - it necessarily requires skills which are unique to our community  
within the SW, e.g. understanding science,
  - if we express what we mean, we can leave the query optimization  
to other people who don't need to know the science, expecting that  
answers produced by new generations of query engines will return more  
and more interesting, meaningful, and scientifically sound answers.

Regards,
Alan

>
> cheers,
> Matthias Samwald
>
>
>
>
> -------- Original-Nachricht --------
> Datum: Tue, 17 Apr 2007 01:31:35 -0400
> Von: Alan Ruttenberg <alanruttenberg@gmail.com>
> An: public-semweb-lifesci@w3.org
> Betreff: Demo SPARQL notes
>
>>
>> I've started a page where I and others can document some of the
>> SPARQL queries and techniques that are being explored as we progress
>> towards the demo.
>> The intention is to have a place to record things that are learned
>> for later reference.
>>
>> http://esw.w3.org/topic/HCLS/HCLSIG_Demo_QueryScratch
>>
>> Regards,
>> Alan
>>
>
> -- 
> "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
> Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
Received on Tuesday, 17 April 2007 14:54:30 UTC