Re: Demo SPARQL notes from Chris Mungall on 2007-04-18 (public-semweb-lifesci@w3.org from April 2007)

From: Chris Mungall <cjm@fruitfly.org>
Date: Tue, 17 Apr 2007 17:38:11 -0700
To: William Bug <William.Bug@DrexelMed.edu>
Cc: Bijan Parsia <bparsia@cs.man.ac.uk>, samwald@gmx.at, Alan Ruttenberg <alanruttenberg@gmail.com>, public-semweb-lifesci@w3.org
Message-Id: <DBB2E260-7E45-40D5-862A-CAD15CF6B1B0@fruitfly.org>
On Apr 17, 2007, at 10:49 AM, William Bug wrote:

> I with Bijan on this issue.
>
> However complex the current OWL representation may appear, it's  
> considerably more terse than the expression of this same info in a  
> relational model.

I'm not sure if this is necessarily the case. If we are talking  
specifically about the representation of OWL *in RDF triples* and the  
corresponding SPARQL queries, then we are essentially talking about a  
3-ary relational model anyway, modulo the usual concerns re open vs  
closed world and the like. And n-ary relations are surely either as  
terse or more terse than 3-ary relations.

Compare facts in an imaginary relational model for OWL [1]:

	existential_restriction(part_of, CellNucleus, Cell)

With [2]:

	subClassOf(CellNucleus,_r1)
	restriction(_r1)
	onProperty(_r1,part_of)
	someValuesFrom(_r1,Cell)

To query [1] for the kinds of entities that can be parts of cells we  
select ?X from
	existential_restriction(part_of, ?X, Cell)

To query [2] for the same we select ?X from
	subClassOf(?X,?_r)
	restriction(?_r)
	onProperty(?_r,part_of)
	some(?_r,Cell)
	
Of course, if we're talking about SQL specifically then yes you  
definitely lose a lot of terseness with tedious JOINs that are  
implicit in the typically tersers SPARQL. But this is just syntax:  
SQL doesn't equate to the relational model.

And of course SQL and most implementations of the relational model  
give you little or no deductive facilities; but then, this is also  
true for most SPARQL implementations too. Even with RDFS entailment,  
you don't have enough for basic class-level (TBox) transitivity.


Anyway, I think I'm being pedantic and straying from the point. The  
issue is that queries expressed in SPARQL over class-level relations  
(eg part_of in a TBox) represented using owl restrictions are  
verbose, contrasted to representations that use a single predicate  
for the class-level relation. The issue here is not the syntax per  
se, rather the additional triples and bNode created when layering the  
OWL on the RDF model. I don't know if it's such a huge problem - I  
have learned to live with it - but I know that people used to n-ary  
relational queries balk at doing a multi-triple-with-bnode query for  
simple TBox queries such as the above.

One solution here is Alan's alternate non-OWL layering of class level  
relations in the RDF model, possibly controversial. Another is an  
additional layer on top of SPARQL - eg some macro language that  
provides constructs such as a single predicate for class level  
relations - and compiles down to SPARQL - this appears to be what is  
suggested below? Manchester syntax is mentioned - a QL based on  
Manchester syntax would be nice. For our ABox query we could say "?X  
part_of some Cell". I imagine this could trivially compile down to  
SPARQL - or it could be an OWL QL that has its own model.

This is related to but different from the issue of entailment - many  
RDF systems, including most SPARQL implementations - give you little  
or no entailment - eg RDFS. This isn't enough to give you a complete  
answer for [2] (assuming part_of is transitive). Alan's  
transformation does, I believe, give you a correct answer for when  
you have RDFS entailment.

>   Yyou can write some very effective SPARQL queries against it,  
> after playing with it a bit to get a more complete understanding of  
> what the ontology is trying to express.
>
> I've certainly been having pretty good luck creating SPARQL queries  
> - even by hand (i.e., without fancy end-user oriented tools) -  
> against some of the similarly modeled data in the NeuroCommons  
> repository.

SPARQL seems adequate in many respects for data oriented queries  
(typically, but not always, ABox) - the verbosity manifests in TBox  
queries, and possibly other scenarios that dictate the standard n-ary  
pattern transform.

> I would agree the current OWL ontologies maybe a little more  
> application-specific than I would want in the long-term, but I  
> think they can still be used quite effectively right now.
>
> Cheers,
> Bill
>
> On Apr 17, 2007, at 11:10 AM, Bijan Parsia wrote:
>
>>
>> On Apr 17, 2007, at 3:08 PM, samwald@gmx.at wrote:
>> [snip]
>>> I am quite interested in your approach of making OWL property  
>>> restrictions accessible to Sparql queries.
>>> In general, our demo contains several ontologies that are mainly  
>>> based on classes and OWL property restrictions. To query the  
>>> information held in these ontologies directly with Sparql would  
>>> be quite complicated, as the OWL property restrictions produce  
>>> RDF graphs that are quite convoluted and hard to query.
>>
>> I'm not sure exactly what you mean here. Hmm. Do you mean that the  
>> OWL in RDF/XML syntax is rather convoluted? No argument there :)  
>> Fortunately, there are alternative syntaxes, including an XML one,  
>> the manchester syntax, and classic DL syntax. One of my things to  
>> agitate for in SPARQL/OWL is a pluggable expression syntax  
>> (Kendall Clark and I did some work toward an XML syntax for SPARQL  
>> over all which would make this easier). But even if not in the  
>> spec, one could write transformers from a sane syntax into the  
>> canonical syntax pretty easierly.
>>
>>> One way of 'querying' such ontologies would be to define a new  
>>> 'query class' with necessary & sufficient conditions that are the  
>>> parameters of the query, run a reasoner, and see which classes  
>>> are classified as subclasses of the 'query class'.
>>
>> Are you querying for classes or for instances?
>>
>>> For example, our current demo script contains the research  
>>> question "How might beta amyloid alter LTP in CA1 neurons?". This  
>>> can be answered by using the BrainPharm dataset and setting up a  
>>> class that has CA1 neurons and beta amyloid as parts and/or  
>>> participants. I did with Protege and Pellet, and it works.
>>>
>>> However, I do not know how this approach could be implemented in  
>>> a real web environment with the current tools that we have. How  
>>> would the ‘query class’ be produced?
>>
>> Do you mean "How on earth could any end user specify this?" Well  
>> there are lots of ways (few involving exposing even a friendly  
>> query syntax). For example, in jSpace (<http://clarkparsia.com/ 
>> projects/code/jspace/>) you can see the queries that are built up  
>> by adding columns, switching them around, and making selections in  
>> a "iTunes like" browsing interface (see mSpace <http:// 
>> www.mspace.fm/>). So forms, QBE, and other interfaces could  
>> generate a class as well as a SPARQL query (and really, classes  
>> are a sort of query :)).
>>
>>> How would it be added to the RDF store?
>>
>> Would you need it to be added? In Pellet, for example, you can ask  
>> for the superclasses of an arbitrary class expression without  
>> adding it to the ontology. This is a pretty normal feature of  
>> reasoners. OTOH, you could add it, e.g., if you wanted it  
>> available for subsequent queries. As you may know, we've done some  
>> work on incremental ontology reasoning under updates. While the  
>> best results have been, thus far, for abox updates, we got  
>> reasonable improvements for TBox updates as well. It's hard to  
>> make predictions without fairly detailed information about the use  
>> patterns and loads.
>>
>>> How should this be implemented with decent scalability? These are  
>>> all questions that should be obvious, but yet they have been  
>>> ignored by many influental ontology / OWL developers so far.
>>
>> I'm not sure what to make of this claim. Some few (to my  
>> knowledge) "influential" ontology/OWL developers have no need for  
>> these things, thus they properly neglect them. I don't think their  
>> influence prevents people from asking and working on these  
>> questions and even making great progress.
>>
>>> I think that we will not have a practical answer to these  
>>> questions in the immediate future.
>>
>> Since I'm still a bit unclear about the specifics (and practical  
>> import) of your question, I'll demure.
>>
>>> Therefore, I think that Alan’s proposal of adding direct  
>>> relations between class entities is the best solution we have at  
>>> the moment. Of course, it would be nice to find a solution that  
>>> would make it possible to keep the resulting ontologies valid OWL  
>>> DL. Maybe we should set up  annotation properties that are  
>>> derived from the original properties (e.g. by concatenating  
>>> ‘_class_property’ to the end of the URIs of the original  
>>> properties) and use these?
>>
>> I'm sorry but I haven't followed the whole debate. There are at  
>> least three sorts of query one might be interested in:
>> 	Semantic [argh, need a better term] TBox/schema oriented query  
>> (e.g., parents, ancesters of classes, etc.)
>> 	ABox/data query (e.g., instances of classes, arbitrary  
>> conjunctive queries)
>> 	Metadata query (i.e., of annotations)
>>
>> (You might want to have all three of these in a single query with  
>> various types of sharing of variables across sorts.)
>>
>> Part of my personal goal for OWL 1.1 is to support the  
>> representation of all this in as clear and sensibly flexible a way  
>> so that querying in and across these dimensions is robust and  
>> effective for applications.
>>
>> Sorry for any misunderstandings I may have had and then generated :)
>>
>> The usability of the entire infrastructure is of great concern to  
>> me, however slight my influence may be. That means that I would  
>> very much like it for people to be able to say what they need to  
>> say, ask what they need to ask, and get useful answers. This means  
>> that the languages must be usable (or have usable tools wrapping  
>> them) and the backends must work for the relevant problems.
>>
>> Cheers,
>> Bijan.
>
> Bill Bug
> Senior Research Analyst/Ontological Engineer
>
> Laboratory for Bioimaging  & Anatomical Informatics
> www.neuroterrain.org
> Department of Neurobiology & Anatomy
> Drexel University College of Medicine
> 2900 Queen Lane
> Philadelphia, PA    19129
> 215 991 8430 (ph)
> 610 457 0443 (mobile)
> 215 843 9367 (fax)
>
>
> Please Note: I now have a new email - William.Bug@DrexelMed.edu
>
>
>
>
Received on Wednesday, 18 April 2007 00:38:33 UTC