Re: subgraph/entailment from Bijan Parsia on 2005-09-07 (public-rdf-dawg@w3.org from July to September 2005)

From: Bijan Parsia <bparsia@isr.umd.edu>
Date: Tue, 6 Sep 2005 23:35:05 -0400
To: Dan Connolly <connolly@w3.org>
Cc: Enrico Franconi <franconi@inf.unibz.it>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <843f48ce3797abc211e1ee08d0ed9121@isr.umd.edu>
On Sep 6, 2005, at 10:35 PM, Dan Connolly wrote:

> On Wed, 2005-09-07 at 03:47 +0200, Enrico Franconi wrote:
> [...]
>> Now we have three possibilities:
>
> I'm getting lost.

To pick up the dialectic as I understand it, Enrico thought (as many do 
on first reading) that the subgraph language *precluded* extending 
SPARQL to query datasets expressed in more expressive logics than 
RDF/RDFS (while respecting the semantics of those logics). That ain't 
so. There are a couple of ways to make the current language work out 
(this is why I dropped my opposition to subgraph in May....however, I 
think the current unspecified situation is confusing as evidenced by 
Enrico's confusion :) The debate has devolved into what language would 
help clarify the situation for people wanting to use SPARQL to query 
OWL kbs. (And just to understand which, if any, work).

> If this is just back and forth between some
> people trying to come up with a proposal, I'll stay tuned.

I think we are trying to do that.

> But if these are supposed to be options for the WG to
> consider, I need more information.

Well, are these distinct? :)

>> 1) replacing "subgraph of" with "entailed by";
>
> Replacing "subgraph" with "entailed by" is not merely
> an editorial change.
> The difference between "subgraph" and "entailed by" is
> visible from test cases, as I recall. The "entailed by"
> wording had some appeal to me, but that's not the design
> the WG had experience with. In particular, given
>
> 	<MarkTwain> dc:author people:twain.
>
> and the query
>
> 	SELECT ?who WHERE { <MarkTwain> dc:author ?who }.
>
> using the "entailed by" wording, not only
> is binding ?who to people:twain an answer, but
> so is binding ?who to _:bnodeN for infinitely many N.

So, this is an interaction between entailment and bnodes in results.

(I hate bnodes :))

Acutally, this will be true for matching if the dataset is closed under 
RDF Entailment, right? I mean, the solutions are *equivalent* (at 
least, AFAICT informally).

Er...so SPARQL is defined only for RDF graphs without their semantics? 
Or, rather, only against the *asserted* triples in an RDF graph?

This isn't happy!

So one way or another we have to deal with the infinite solution 
problem. Perhaps restricting ourselvs to lean graphs will help? Hmm. I 
have to think on this more.

Have we met the charter if we can't query (or have runaway bad query) 
for graphs closed under RDF entailment? It's not clear, but 1.8 seems 
to say so. I guess:

"""The principal task of the RDF Data Access Working Group is to gather 
requirements and to define an HTTP and/or SOAP-based protocol for 
selecting instances of subgraphs from an RDF graph. """

Could be read to be only of asserted triples, but I really think no one 
expected that. Did they?

2.1 is weird, since it explicitly only enumerates RDF *Schema* and 
*OWL* as being out of scope, but then talks about a "notional RDF 
graph". What is *that* btw?

I need to think on it. I mean, in the end it's sort of a spec problem 
more than a fundemental design problem (I hope :)). But it does need to 
be addressed.

> I don't think that's what anybody is advocating. I hope somebody
> will clarify, with test cases (or sketches).

My "use case" for being clear on this subject is that as it is written, 
it seems that it might be the case that the answer set of  *any* query 
with a variable and one match against a dataset is infinite. Ok, this 
isn't a use case :) My use case is to be able querying RDFS and OWL 
datasets using a mild extension of SPARQL at worst. We can always 
rewrite the sparql semantics for each langauge in toto...but but 
but...that seems v. bad.

> And perhaps some use cases to motivate the change.

Oops.

>> 2) explaining that the subgraphing is done on the deductive closure
>> of the original information (clumsy);
>
> It's already possible for a server to chose the deductive closure
> of the original information as its background graph.

Pointer please?

> Do you mean to change the language such that matching is always done
> on the deductive closure?

That's my understanding of Pat's proposal.

> I don't see how that's possible in
> the general case,

What's the general case?

> given that
> any RDF property might be defined with an extent that affects
> the deductive closure...

I didn't parse this.

> e.g. it might have rdf:type XYZ,
> where XYZ is a subClassOf owl:TransitiveProperty.

This would be OWL Full, yes? So? It's up to each language to
	1) define the deductive closure
	2) define a mapping back into triples
	3) define all the syntactic variants of  equivalents that might not 
show up in the deductive 	     closure, if any, in terms of triples 
(ok, this isn't necessarily separate from 2)

That's this game.

>> 3) explaining that the subgraphing on all the models of the original
>> graph (requires some work to find the proper wording).
>
> Likewise, I would need some explanation of that.

If you understood Enrico's example, you get that from an OWL document, 
there can be multiple ways of extending the "facts" of an OWL document 
(the ABox part only) to form a "complete" modal of the ontology. That 
is, a set of rdf assertions that make every OWL axiom true. Given the 
expressiveness of OWL, it is rare that there is only *one* way to do 
this. In fact, there are often multiple incompatible ways.

Think of each of these as a virtual graph generated from the ontology. 
A SPARQL query succeeds if it has a subgraph in *every* virtual graph 
generated from the ontology. This backdoors entailment back in, but as 
derived rather than primitive :)

This works for RDF and RDFS, since there is only one completion for 
them. (I'm pretty sure :) Still some worries about bnodes  I guess)

There is a question about which, if any, needs to be normative. It 
might be enough to get a working group submission *IF* the infinite 
results problem is handled. But there needs to be a bit of scaffolding 
in the spec to allow for this.

Of course, you could just restrict yourself to asserted triples. My org 
will probably object, though.

Cheers,
Bijan.
Received on Wednesday, 7 September 2005 03:35:18 UTC