Re: Proposed change to the OWL-2 Direct Semantics entailment regime from Bijan Parsia on 2010-12-10 (public-rdf-dawg@w3.org from October to December 2010)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Fri, 10 Dec 2010 12:54:32 +0000
To: Guido Vetere <gvetere@it.ibm.com>
Cc: public-rdf-dawg@w3.org
Message-Id: <78F189DC-C0EE-407C-B435-2557A883C2D9@cs.man.ac.uk>
(Sorry for the delay responding. Classes and illness hit :))

On 6 Dec 2010, at 20:58, Guido Vetere wrote:

> > Bijan Parsia <bparsia@cs.man.ac.uk> 
> 
> > > We don't use SPARQL as a query language (we adopt a datalog-style 
> > > syntax instead) but we might support (some) SPARQL as a front-end in
> > > the future, as long as it does not misses relevant features. I 
> > > cannot tell if making all variable distinguished would definitely 
> > > prevent form covering some relevant use case.
> > 
> > That's unfortunate. It would be really helpful to find some cases 
> > where users would *notice* the difference. Best is when they would 
> > rely on non-distinguished variables.
> 
> Maybe the average user would hardly understand the difference, but we know about that difference, we know that it may show up in some cases and we should understand (in advance) if it matters for customers. I cannot honestly tell about concrete cases we experimented with, because we didn't consider how our queries would have been answered without the features we actually support. But we can try to go through some of the use case we've run and make some simulation. In this case, we'll be back to you.   

That would be very amazing. If there's anything I can do to facilitate this, please let me know. I'd be happy to drudge through some examples

> > I hope I don't give offense by asking the following clarificatory 
> > question: Do you really mean variables which range over unnamed 
> > individuals, or do you just mean variables which are projected away 
> > (in the Datalog world, these coincide as there are no unnamed 
> > individuals; hence my question)? 
> 
> We don't have unnamed individuals, we may have generated names. 

But you do have existential restrictions? But  no bNodes? (I.e., you don't generally work with RDF data?)

> > > provided that supporting this feature would be up to implementers.
> > > After all, not all SPARQL features are going be supported by all 
> > > implementers, I guess. 
> 
> > The problem is that nondistinguished variables get prohibitively 
> > harder as you raise the expressiviety of the logic. Ideally, we 
> > would like to make it easy for a user to port a query from an RDF 
> > engine to a OWL QL engine to an OWL DL engine and get compatible 
> > results (if the engines all support all of SPARQL). Nondistinguished
> > varibles makes that impossible.
> 
> I understand your point. Maybe this in naive, but why not consider a sort of SPARQL layering, much like OWL does? 

In a sense, the many entailment regimes we have achieve such a layering, or close to it. The current issue is how complicated to make that layering and what properties we want of the layering.

I think the best (from a user understandability perspective) is that:
	1) A user can use any entailment regime with (just about) any dataset. 
	2) As many queries as possible are admitted by all regimes. (I.e., it's as close to "one query language" as we can make it)
	3) If a user uses a more expressive regime (with some fudging) they get at least all the answers they got with a less expressive regime, and possibly more.
	4) Differences between queries, data, and regime produce different answers in a way that is easily understood by users of all regimes.

With the current entailment regime, we have 1 owl regime that adheres to 1, 2, and 3 for all DL profiles (e.g., OWL DL, QL, EL, and RL) and is pretty close to that for DL vs. Fullish (e.g., RDF, RDFS, and OWL Full) regimes.

If we introduce nondistinguished variables, we will have to have at least two regimes (for QL, EL, and maybe RL) and for OWL DL. The OWL DL version will not allow for many queries that the other one does. 

If we don't skolemize bNodes, we will have to forbid lots of datasets (e.g., with cyclic patterns of bNodes) and queries, and the answer set we get in some cases with RDF will be *smaller* than what we get with OWL QL, EL, EL, or RL.

We could keep the current regime and then introduce two more, but that seems even more complicated.

My current preference is to keep the single regime and see what extensions sort themselves out.

> > On the flip side,would you find it extremely burdensome to add 
> > nondistinguished variables as an extension? I see you already depart
> > from the OWL spec by imposing the UNA, would this departure 
> > discourage you from implementing SPARQL at all?
> 
> Not at all, the point is that we would miss a feature that we would otherwise support. 

Of course. But we need to determine whether the trade off in complicating the overall story and interop is worth standardizing the feature at this time.

> > Thanks very much for your time.
> 
> It's a pleasure. 

Oh well, thanks ;)

Please let me know if there's anything I can do to help further the identification of use cases and their analysis.

Cheers,
Bijan.
Received on Friday, 10 December 2010 12:55:05 UTC