Re: subgraph/entailment from Pat Hayes on 2005-09-19 (public-rdf-dawg-comments@w3.org from September 2005)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 19 Sep 2005 13:32:09 -0500
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, Ian Horrocks <horrocks@cs.man.ac.uk>, Bijan Parsia <bparsia@isr.umd.edu>, Dan Connolly <connolly@w3.org>, public-rdf-dawg-comments@w3.org
Message-Id: <p0620070dbf4ec5146382@[192.168.1.5]>
((I am sorry that I did not get engaged in this discussion earlier: I 
have been off-network for a couple of weeks.))

>On 8 Sep 2005, at 10:03, jos.deroo@agfa.com wrote:
>>>>>>I should *not* get the bnode coming from the redundant
>>>>>>triple, but simply {<http://example.org/book/book1>}.
>>>>>
>>>>>Otoh, for
>>>>>
>>>>>CONSTRUCT { ?x dc:title "SPARQL" }
>>>>>WHERE { ?x dc:title "SPARQL" }
>>>>>
>>>>>I actually get
>>>>>
>>>>><http://example.org/book/book1> dc:title "SPARQL".
>>>>>_:b_0_ dc:title "SPARQL".
>>>>>
>>>>>which I assume to be fine, no?
>>>>>
>>>>
>>>>The result form shouldn't affect the (number of) results
>>>>unless there's something expicit in the form which does that
>>>>(which I don't think is true for construct). At least,
>>>>that's what I would expect!
>>>>
>>>
>>>Indeed. I expect no bnode in this case too, independently on
>>>the type of query, since the answer should be the same as the
>>>one with the dataset without the bnode.
>>>
>>
>>[[
>>The CONSTRUCT result form returns a single RDF graph
>>specified by a graph template. The result is an RDF
>>graph formed by taking each query solution in the
>>solution sequence, substituting for the variables
>>into the graph template and combining the triples
>>into a single RDF graph by set union.
>>]]
>>http://www.w3.org/2001/sw/DataAccess/rq23/#construct
>>
>>and
>>
>>###
>><http://example.org/book/book1> dc:title "SPARQL".
>>_:b_0_ dc:title "SPARQL".
>>###
>>
>>is a single RDF graph and a single result and I don't
>>see any issue wrt to bnodes or minimality in it;
>>what is the issue in this case?
>
>As I argued with my latest long email 
><http://www.w3.org/mid/E1ED5pH-0005ls-Kz@lisa.w3.org>, there only 
>one binding (query solution) for ?x according to RDF-MT semantics, 
>namely <http://example.org/book/book1>.
>Therefore, substituting the variable ?x in the graph template gives 
>you only the triple
>
><http://example.org/book/book1> dc:title "SPARQL".
>
>even in the case of the dataset
>
><http://example.org/book/book1> dc:title "SPARQL" .
>_:b dc:title "SPARQL" .
>
>
>Is there something I'm missing?

Yes.

You are making a central confusion, between query matching and 
entailment. Querying is NOT checking entailment. Querying is checking 
what patterns are actually present in the graph (possibly under 
various constraints, etc., but leave such complexities aside for 
now.) This is not the same task as checking that the query is 
entailed by the graph. It is an essentially syntactic, algebraic kind 
of business, not a semantic one.

If you are used to thinking of a query as similar to a goal to be 
proved, as is commonly done in LP, please try to put that idea aside, 
as it is actively misleading here. This is not the way that SPARQL is 
designed. For very simple queries and for entailment = simple 
entailment, and if we think of query variables as being specially 
marked bnodes, then they are almost indistinguishable, since A simply 
entails B just when an instance of B is a subgraph of A: but as soon 
as we consider more complex kinds of entailment or querying, the 
analogy breaks down.

Thus, I find your assertion above mistaken, in its reference to the 
RDF semantics. The RDF MT makes no reference to queries, and SPARQL 
does not refer to entailment as defined by the RDF MT. So to make any 
assertion about the correctness or otherwise of a query by reference 
to the RDF MT is to conflate two topics that should be kept separate. 
The signs of this mistaken conflation are evident in the long email 
you refer to, for example (in response to Dan C.):

"Specifying query answering by looking at the deductive closure is an 
unhortodox way to see entailment"

It would be an unorthodox way to see entailment, if it were a way to 
see entailment at all. But it is not: entailment is one topic, query 
answering is a different topic. The confusion arises from viewing the 
latter as though it were the former.  Again, the example that you 
have identified:

<http://example.org/book/book1> dc:title "SPARQL" .
_:b dc:title "SPARQL" .

which as you point out is redundant (non-lean) because it both 
entails and is entailed by a proper subgraph

<http://example.org/book/book1> dc:title "SPARQL" .

of itself, so both these graphs have the same deductive closure. 
However, you then go on to make an inappropriate claim:

"In other words, the deductive closures according to RDF-MT of the 
two graphs above *are the same*. And I have to get the *same answer* 
when I query these two gr
aphs!

No, that does not follow. Logically equivalent graphs are not 
required to give identical query results. In fact, the contrary is 
true, it seems to me: an appropriate query process should not give 
the same results from these two graphs, precisely because one of them 
is redundant and the other is not. An accurate query service should 
then give redundant answers in one case but not the other, accurately 
reflecting the structures it finds in the graph. Put another way, it 
is not part of the task of the basic querying process to perform 
inferences: its role is, rather, to accurately respond to the 
structures it actually finds in the graph itself. The task of the 
querying process is to match a pattern against a graph and send back 
the results it finds in that graph. It is not to indicate that the 
query pattern is entailed by the graph.

(BTW, there is a perhaps unfortunate possible misunderstanding about 
what 'equivalent' means. There is a definition of 'equivalent graph' 
in the RDF Concepts document, cited in an email by Peter, but this is 
not the same meaning as 'logically equivalent' in the sense of 
'entails and is entailed by'. That notion of logical equivalence is 
nowhere defined in the RDF specs, in fact. The equivalence cited 
there refers to re-naming bnodes, in effect: two graphs are 
equivalent just when they are bnode-re-namings of each other. So in 
this (strong) sense, a non-lean graph is not equivalent to its lean 
subgraph, and the two graphs in the example are not equivalent.)

If one wishes to offer a query-answering service that does check 
entailments, then the appropriate thing to do within the SPARQL 
framework is to declare that one is matching queries against a 
'virtual graph' which is some kind of logical closure of the graph, 
rather than the graph itself. But these are different graphs, note. 
SPARQL does not require queries to be evaluated only against graphs 
which are closed under some notion of entailment: it is 
entailment-neutral (except, as noted above, regarding simple 
entailment, which follows in effect as a structural consequence of 
the very act of matching itself.) This is not an error or an 
omission, I would emphasize, but a design decision.

There is lot more that could be said, particularly about the reasons 
and justifications for this decision, but I will whisk this off so 
y'all can read it before the telecon tomorrow.

Pat


>--e.


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 19 September 2005 18:32:28 UTC