Re: Demo SPARQL notes from Bijan Parsia on 2007-04-18 (public-semweb-lifesci@w3.org from April 2007)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Wed, 18 Apr 2007 02:28:07 +0100
To: Chris Mungall <cjm@fruitfly.org>
Cc: William Bug <William.Bug@DrexelMed.edu>, samwald@gmx.at, Alan Ruttenberg <alanruttenberg@gmail.com>, public-semweb-lifesci@w3.org
Message-Id: <D76655FA-9256-460E-A886-CCDCCB0B1C8D@cs.man.ac.uk>
On Apr 18, 2007, at 1:38 AM, Chris Mungall wrote:

> On Apr 17, 2007, at 10:49 AM, William Bug wrote:
>
>> I with Bijan on this issue.
>>
>> However complex the current OWL representation may appear, it's  
>> considerably more terse than the expression of this same info in a  
>> relational model.
>
> I'm not sure if this is necessarily the case.

To be clear about what I said: I am not fond of using triple based  
syntaxes for representing class expressions (and axioms involving  
class expressions, and queries involving class expressions). I also  
dislike long names for operators (e.g., intersectionOf) and having to  
name (some) pieces of syntax (e.g., owl:Restrictions). Some non  
triple/rdf representations representations retain the latter features  
(e.g., OWL 1.1 functional and XML syntax). I still tend to find them  
superior to the triple based ones, so these are separate issues.

For complex class expressions, I prefer an operator syntax such as  
standard DL or FOL. Standard variable free DL syntax has considerable  
concision and composition advantages, in my experience. (Even in ad  
hoc textual variants, it's nice to be able to do something like  
(some.P C) instead of (some(y)(Pxy & Cy). Nesting quantifiers is  
really nice in DL syntax.)

> If we are talking specifically about the representation of OWL *in  
> RDF triples* and the corresponding SPARQL queries, then we are  
> essentially talking about a 3-ary relational model anyway,

I hesitate to follow moves about syntax through "essentially"s to  
models. Lisp lists are "essentally" chains of cons cells, but '(1 2  
3) doesn't wear that on its sleeve (and could resolve to an array  
based internal form).

> modulo the usual concerns re open vs closed world and the like.

Such talk *really* worries me when we are talking *syntax*.

> And n-ary relations are surely either as terse or more terse than 3- 
> ary relations.

Since OWL is restricted in the number of distinct variables (and the  
combinations thereof), you get some of the advantages of variable  
freedom even in the hairier syntaxes.

> Compare facts in an imaginary relational model for OWL [1]:
>
> 	existential_restriction(part_of, CellNucleus, Cell)

This is not far off from current OWL 1.1 functional syntax, see:
	http://webont.org/owl/1.1/owl_specification.html#4

But I'd want it to be composition, i.e., "Cell" to be replacable with  
a complex class expression, e.g., another existential_restriction. If  
we are going to talk "relational model" then we've added function  
terms, at the very least.

> With [2]:
>
> 	subClassOf(CellNucleus,_r1)
> 	restriction(_r1)
> 	onProperty(_r1,part_of)
> 	someValuesFrom(_r1,Cell)

Yes, this is exactly the "trouble with triples". ewww. hate that  
bnode too.

[snip]
> And of course SQL and most implementations of the relational model  
> give you little or no deductive facilities; but then, this is also  
> true for most SPARQL implementations too. Even with RDFS  
> entailment, you don't have enough for basic class-level (TBox)  
> transitivity.

Pellet and KAON2 support SPARQL syntax for *Abox* queries (to some  
degree, it varies) and racer has a similar language.

>
> Anyway, I think I'm being pedantic and straying from the point. The  
> issue is that queries expressed in SPARQL over class-level  
> relations (eg part_of in a TBox)

And relatively new. I.e.,most conjunctive query in DL land is purely  
over *aboxes*. Querying *TBoxes* is done with special functions a la  
DIG. Thus, the triple syntax of sparql is a bit misleading.

However, in at least one version of Pellet we had experimental  
support for mixed TBox/Abox queries and we've written this up:
	http://clarkparsia.com/files/pdf/sparqldl.pdf
with an eye to getting a spec together at OWLED. Intuitively, you  
compile out the TBox query parts and turn them into DIG calls, then  
perform query expansion on the class or property variables in the  
abox atoms.

> represented using owl restrictions are verbose, contrasted to  
> representations that use a single predicate for the class-level  
> relation. The issue here is not the syntax per se, rather the  
> additional triples and bNode created when layering the OWL on the  
> RDF model. I don't know if it's such a huge problem - I have  
> learned to live with it - but I know that people used to n-ary  
> relational queries balk at doing a multi-triple-with-bnode query  
> for simple TBox queries such as the above.

Hence my preference to plug in a better syntax for SPARQL/DL. The  
triple syntax is also misleading as it leads users to expect some  
queries to be legal (and useful) which just aren't.

This is something which we'll spend some considerable time at OWLED on.

> One solution here is Alan's alternate non-OWL layering of class  
> level relations in the RDF model, possibly controversial. Another  
> is an additional layer on top of SPARQL - eg some macro language  
> that provides constructs such as a single predicate for class level  
> relations - and compiles down to SPARQL - this appears to be what  
> is suggested below? Manchester syntax is mentioned - a QL based on  
> Manchester syntax would be nice. For our ABox query we could say "? 
> X part_of some Cell". I imagine this could trivially compile down  
> to SPARQL - or it could be an OWL QL that has its own model.

As I said, Kendall Clark and I made progress on an XML syntax for  
SPARQL. You could then leave the algebra parts constant and plug in  
OWL 1.1's xml syntax as either a compilation target or source.

> This is related to but different from the issue of entailment -  
> many RDF systems, including most SPARQL implementations - give you  
> little or no entailment - eg RDFS. This isn't enough to give you a  
> complete answer for [2] (assuming part_of is transitive). Alan's  
> transformation does, I believe, give you a correct answer for when  
> you have RDFS entailment.
>
>>   Yyou can write some very effective SPARQL queries against it,  
>> after playing with it a bit to get a more complete understanding  
>> of what the ontology is trying to express.
>>
>> I've certainly been having pretty good luck creating SPARQL  
>> queries - even by hand (i.e., without fancy end-user oriented  
>> tools) - against some of the similarly modeled data in the  
>> NeuroCommons repository.
>
> SPARQL seems adequate in many respects for data oriented queries  
> (typically, but not always, ABox) - the verbosity manifests in TBox  
> queries, and possibly other scenarios that dictate the standard n- 
> ary pattern transform.

And in arbitrary DL systems, you are most likely to only *get* ABox  
queries, since traditionally you used an API for your TBox queries.  
The above reference paper is trying to change that.

(Cerebra had a sorta mixed tbox/abox query language based on XQuery,  
but it just made the DIGgish calles more or less explicit.)

Cheers,
Bijan.
Received on Wednesday, 18 April 2007 01:28:23 UTC