Re: subgraph/entailment from Enrico Franconi on 2005-09-07 (public-rdf-dawg@w3.org from July to September 2005)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Wed, 07 Sep 2005 21:40:36 +0200
To: parsia@isr.umd.edu
Cc: onnolly@w3.org, public-rdf-dawg@w3.org
Message-ID: <E1ED5pH-0005ls-Kz@lisa.w3.org>
Bijan Parsia wrote:
> On Sep 7, 2005, at 9:43 AM, Dan Connolly wrote:
> > In SPARQL QL, you start with some RDF dataset (i.e. a bunch of
> > graphs). How you got there is your business, but we expect
> > one of the popular ways is by grabbing data off the web and
> > computing, say, the RDFS closure of it.

Let's suppose by now that we all agree on this. Specifying query answering by looking at the deductive closure is an unhortodox way to see entailment, but now w
e agree that it technically works. I may come back to this later.

Now, we want that SPARQL works correctly with at least normative RDF, i.e., RDF as defined in the RDF-MT document - which includes RDF-entailment. Bijan correcl
y notes:

> Do you find Enrico's point that equivalent graphs can return different
> sets of hits interesting or worth considering? Does the WG have any
> advice for dealing with RDF entailment based closure (which is always
> infinite and for obvious and useful and used queries like ?p rdf:type
> rdf:Property will return infinite results?) Actually, I think if SPARQL
> as written can't properly or practically handle RDF entailment then it
> is broken. 

Indeed. This is a very important point on which I strongly agree.

> Similarly for RDFS entailment. It should AT LEAST get those
> right, or it should make clear that it can only (practically speaking)
> handle "base graphs" until it is clear how to handle these other
> situations.

This summarises nicely my point of view. The current version of the document does not capture RDF properly yet, but to my understanding it is not difficult to f
ix it.

> So i'm back to wanting a clearly specified design.
> One criterion I have on that design is that it doesn't preclude
> extension to OWL. I've argued that pat's approach *can* so extend,
> though there may be some hairy bits and it is v. non-standard and
> confusing (thus really needs some serious attention in the document). 

Agree.

> A second criterion is that it's clear what interoperable (same answers)
> implemenations do for graphs closed under RDF entailment  (and
> preferably, RDFS entailment; getting one should make the other clear).

Agree. That's my point on minimality.

> Another criterion is that it be practical for realistic use cases. The
> second and third have some tension. Some notion of minimal or
> non-redundant or non-silly results would be helpful.

Yep. My proposal is practical, I guess.

> My test case is:
>
> select ?p where {?p rdf:type rdf:Property}
>
> against an empty dataset. (Against an arbitrary dataset, I would expect
> to get all the properties mentioned in that dataset ++ the ones
> stemming from the axiomatic triples; I might prefer only the inferred
> ones without the axiomatic triples).
>
> Under rdf semantics, the answer should always include rdf:type rdf:type
> rdf:Property. The answer set should also be infinite. This doesn't seem
> to be the most useful situation although it's the most "naively
> correct" under the current approach.

Yes. Please note that the answer is infinite for two distinct reasons. 

The first one is that the RDF vocabulary is infinite (because of rdf:_1, rdf:_2, ...). We can not do anything about that, but just excluding rdf:_1, rdf:_2, ...
 from the RDF vocabulary. Any implementor willing to have a terminating algorithm :-) has the only choice to declare this limitation in the vocabulary. I don't 
dare to ask to put something about that in the document :-) As a matter of fact there is no other choice. However, I am not worried too much about that, since i
t is obviously undisputable.

The second reason for an answer to be infinite is more subtle: this comes from the fact that bnodes in the answer are allowed. In fact, for any triple which hap
pens to be in the answer (for some arbitrary reason), any other triple obtained by replacing, say, the first node with a bnode (for any arbitrary bnode) is *als
o* clearly in the answer. But why we usually don't see this? Because we implicitly assume a notion of minimality: we don't want in the answer triples which are 
obviously redundant (i.e., when a bnode can be matched by an URI or literal). So, SPARQL algorithms discard this case, and do not generate all the possible vari
ation for all the bnodes.

However, we spotted a case where the SPARQL document fails to find such redundant triples in the answer, if you want to take RDF-MT seriously. Consider the foll
owing RDF graph:

<http://example.org/book/book1> dc:title "SPARQL" .
_:b dc:title "SPARQL" .

According the the RDF-MT semantics, this graph contains a redundant triple (the second triple with the bnode), since the original graph with that triple and the
 graph without that triple are equivalent, i.e., they RDF-entails each other. I.e., the above graph is RDF-equivalent according to RDF-MT to the graph:

<http://example.org/book/book1> dc:title "SPARQL" .

In other words, the deductive closures according to RDF-MT of the two graphs above *are the same*. And I have to get the *same answer* when I query these two gr
aphs!
So, when I query each of the above graphs as follows:

SELECT ?x
WHERE { ?x dc:title "SPARQL" }

I should *not* get the bnode coming from the redundant triple, but simply {<http://example.org/book/book1>}.

Consider now an even more subtle example of redundancy. Let's consider the following dataset:

_:a rdf:type rdf:Property

This dataset is present in *any* completion (i.e., RDF-MT deductive closure) of any RDF dataset, including the empty one. In fact, it is a redundant triple 'per
 se', since - if you take the RDF-MT seriously - it should belong to the deductive closure of any dataset: as a matter of fact, it is in the deductive closure o
f the axiomatic triples.

If SPARQL is used in the modality "take RDF-MT seriously", we have to get such behaviour (discarding the redundant triples in the answer).

So, my request is to state in the document that triples in the answer set can not be redundant (in other words, the answer set is minimal wrt triple matching). 
Otherwise, by taking RDF-MT seriously, any query should always have an infinite answer set, even in the case of finite RDF vocabulary.

cheers
--e.

Enrico Franconi                  - franconi@inf.unibz.it
Free University of Bozen-Bolzano - http://www.inf.unibz.it/~franconi/
Faculty of Computer Science      - Phone: (+39) 0471-016-120
I-39100 Bozen-Bolzano BZ, Italy  - Fax:   (+39) 0471-016-129
Received on Wednesday, 7 September 2005 19:43:42 UTC