Re: subgraph/entailment from Dan Connolly on 2005-09-07 (public-rdf-dawg@w3.org from July to September 2005)

From: Dan Connolly <connolly@w3.org>
Date: Wed, 07 Sep 2005 13:44:52 -0500
To: Bijan Parsia <bparsia@isr.umd.edu>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <1126118692.4430.37.camel@dirk>
On Wed, 2005-09-07 at 10:08 -0400, Bijan Parsia wrote:
> On Sep 7, 2005, at 9:43 AM, Dan Connolly wrote:
> 
> > On Wed, 2005-09-07 at 09:27 -0400, Bijan Parsia wrote:
> >> On Sep 7, 2005, at 9:19 AM, Dan Connolly wrote:
> [snip]
> >> This is a contradiction (I think we have a terminology conflict). The
> >> contribution of more expressive logics cannot be asserted triples. By
> >> "asserted" I meant, "asserted in the original document/dataset" not
> >> "asserted 'by inference'".
> >
> > Er... "original" is a distinction that's not visible to SPARQL QL.
> 
> I understand the intent here, but that's not well specified (to me at 
> least) in the current document. Including how this is supposed to 
> extend for more expressive languages (obviously I get the basic notion, 
> but I think we'll save a lot of grief is we're clearer on that point).
> 
> > In SPARQL QL, you start with some RDF dataset (i.e. a bunch of
> > graphs). How you got there is your business, but we expect
> > one of the popular ways is by grabbing data off the web and
> > computing, say, the RDFS closure of it.
> 
> Do you find Enrico's point that equivalent graphs can return different 
> sets of hits interesting or worth considering?

It's interesting enough to put it in the test suite, to say yes,
we thought about that, and yes, we meant that.

REQUEST FOR TESTCASE.
If the test suite maintainers haven't done their magic by
the time I send out the next agenda, somebody please remind me.

It doesn't seem like new information that should cause us to
reconsider any design issues.

>  Does the WG have any 
> advice for dealing with RDF entailment based closure (which is always 
> infinite and for obvious and useful and used queries like ?p rdf:type 
> rdf:Property will return infinite results?)

I'm pretty sure that as a WG, no, have not given any advice for that
case.

>  Actually, I think if SPARQL 
> as written can't properly or practically handle RDF entailment then it 
> is broken. Similarly for RDFS entailment. It should AT LEAST get those 
> right, or it should make clear that it can only (practically speaking) 
> handle "base graphs" until it is clear how to handle these other 
> situations.

The present design seems to accomodate practical approaches to
RDFS entailment, or so I gather via Steve. I haven't seen
anybody motivated to bother with RDF entailment.

Does anyone else think we should add a requirement for dealing
with RDF entailment? Or that I should add an issue to the issues list?

I can imagine useful clarifications about limitations in this area.
You're welcome to suggest some text.


> I guess you could let chips fall and say, "hey, rdf and rdfs entailment 
> per se are hard to deal with...suck it up".

That seems like a concise summary of the state of the art.

> Sorry to have not banged on this bit sooner, but I hadn't really 
> noticed it before.

"So much time... so little to do... strike that; reverse it"
 -- Willy Wonka

> > The specification of SPARQL QL and our test harness and such
> > start there, and tell you, given a query, what the results are.
> 
> Which tests deal with graphs under RDF semantics?

None/all/any. The tests, like the QL spec, are orthogonal to inference.

> [snip]
> >> If the way around this is to do some sort of closure and then "dump"
> >> the data (roughly) and reload it..well....now we're requiring
> >> extrasilly gyrations to kill clarity and avoid some important details.
> >
> > You don't have to dump it anywhere; you don't even have to
> > pre-compute the whole thing; you can do the query backward-chaining
> > style, if you like.
> 
> I meant conceptually. In any case, I think one throw away line in the 
> first not clearly normative paragraph of the document doesn't adquately 
> explain this design. So i'm back to wanting a clearly specified design. 

I'm reasonably confident that the spec is clear on its intended
scope. If there are any cases that are not clear, please sketch
them and we'll add them to the test case and fix the spec if necessary.

You seem to want to expand the scope of the spec. I'll stay tuned
for suggested text or an indication that a critical mass of the WG
thinks I should add an issue or consider a new requirement. I'm
nearly persuaded that we should make an issue out of the OWL
disjunction worker example, if only to postpone it; the example
and the prover9 proof and such shouldn't get lost in the blizzard
of email.

> One criterion I have on that design is that it doesn't preclude 
> extension to OWL.

That it works OK with some parts of OWL and not so well
with others (e.g. disjunction in the worker example) is something
the WG has considered, and, so far, gone along with.

>  I've argued that pat's approach *can* so extend, 
> though there may be some hairy bits and it is v. non-standard and 
> confusing (thus really needs some serious attention in the document). A 
> second criterion is that it's clear what interoperable (same answers) 
> implemenations do for graphs closed under RDF entailment  (and 
> preferably, RDFS entailment; getting one should make the other clear). 
> Another criterion is that it be practical for realistic use cases. The 
> second and third have some tension. Some notion of minimal or 
> non-redundant or non-silly results would be helpful.
> 
> My test case is:
> 
> select ?p where {?p rdf:type rdf:Property}
> 
> against an empty dataset.

That has no results. Again: the term 'dataset' is used in the
spec to refer to the graph(s) against which the query is evaluated.

Let's add that to the test suite. REQUEST FOR TESTCASE.

Perhaps you meant the test dataset to be (a single graph that has)
all the axiomatic triples of RDF...

>  (Against an arbitrary dataset, I would expect 
> to get all the properties mentioned in that dataset ++ the ones 
> stemming from the axiomatic triples; I might prefer only the inferred 
> ones without the axiomatic triples).
> 
> Under rdf semantics, the answer should always include rdf:type rdf:type 
> rdf:Property. The answer set should also be infinite. This doesn't seem 
> to be the most useful situation although it's the most "naively 
> correct" under the current approach.

We haven't bothered with infinite graphs in the test harness so far.
I think the definitions in the spec say that you get an infinite
answer set in this case, though we haven't elaborated on it in prose.

> Cheers,
> Bijan.
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Wednesday, 7 September 2005 18:45:04 UTC