Re: What is "the serious bug in entailment semantics" found by J. Perez"? from Pat Hayes on 2006-08-10 (public-rdf-dawg@w3.org from July to September 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 9 Aug 2006 23:25:33 -0700
To: Enrico Franconi <franconi@inf.unibz.it>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230926c10052b8b624@[192.168.1.6]>
>Basically the definition of "Basic Graph Pattern 
>E-matching" 
><http://www.w3.org/2001/sw/DataAccess/rq23/rq24.html#BGPgeneral> 
>is not equivalent to the expected implementation 
>as described in 
><http://www.w3.org/2001/sw/DataAccess/rq23/rq24.html#BGPsparql>.

Actually I think it is, but the issue still needs 
to be discussed. See end of this message for 
details.

>This means that the definition of "Basic Graph 
>Pattern E-matching" has to be changed.

Or, maybe, play a different, and weaker, role? 
Please, allow me to make the case for this 
proposal before dismissing it out of hand.

First, it is clear that the implementations use 
an essentially syntactic criterion for matching, 
rather than any kind of entailment relationship. 
This syntactic process is easy to describe, 
relatively easy to implement, and easy to 
understand. It can be defined quite precisely in 
terms of an algorithm. All of these (and other) 
considerations motivate using this as the 
*normative defining* criterion in the spec for 
what constitutes an answer set.

However, we need to relate it to the semantics. 
For simple entailment, this algorithmic 
definition bears a simple relationship to simple 
entailment which is easy to establish and is 
closely related to the core of the current 
semantic definition: the conjunction of the 
scoping graph and all answer instances of the 
query pattern is simply entailed by the KB. This 
semantic condition then becomes a *necessary but 
possibly not sufficient* condition on an answer 
set; that is, it does not *define* the answer 
set, but merely imposes a semantic constraint 
upon it. The answer set may be *defined* in some 
other way - for current purposes, by a matching 
algorithm - and the spec should then show that 
this definition does indeed conform to the 
semantic constraint.

The general form of a query semantics in the 
SPARQL spec then would be that any extended 
notion of a query answer must (1) be 
well-defined, *somehow* - we need not say how - 
and (2) satisfy the general semantic constraint, 
which is like the above but with a different 
notion of 'entailment' inserted. This leaves the 
door open for a wide range of options, including 
Bijan's suggested restriction of disallowing 
bnode bindings in the DL cases (although I will 
continue to argue against this on different 
grounds, and on a different thread) but also many 
others, since by demoting the semantic machinery 
from a parameterized *definition* to a 
parameterized *constraint*, it allows a wider 
variety of ways to define an answer set for a 
particular regime.

A possible objection is that this means that the 
choice of an entailment does not, in and of 
itself, specify a version of SPARQL. True; but we 
are already in that situation, since the current 
definition allows the same entailment to be used 
with different notions of scoping set: and we 
recently saw that this could produce 
misunderstandings about exactly what a particular 
entailment regime would, er, entail for the 
definition of 'answer'. And in this suggested 
framework, at least the actual semantic 
definitions would have only the single parameter 
of an entailment type; other differences would be 
viewed (IMO, correctly) as not changing the 
semantics but rather being differences in how 
best to computer answer sets which are 
semantically appropriate to the entailment regime 
in use.

And finally, this would allow us to move forward 
with minimal changes, preserving the essential 
semantic foundations, without needing to wait on 
getting what seems to be quite tricky, and new, 
science done.

>Jorge Pérez wrote:
>
>>In the following I will write blank nodes in uppercase and
>>uris in lowercase (just to simplify notation).
>>
>>Consider the graph
>>
>>G = { (a, p, a), (X, p, Y) }
>>
>>and the basic graph pattern query
>>
>>Q = { (?q, p, ?q) }.
>>
>>I'm considering simple entailment and Q does not have any blank node, so
>>we can choose G itself to be the scoping graph and the terms of G to be
>>the scoping set, and then any replacement of the single variable in Q
>>results in a well formed RDF graph for simple entailment. So applying the
>>definitions in the current specification of SPARQL, one obtains that a
>>mapping S is a solution if the graph
>>
>>{ (a, p, a), (X, p, Y), (S(?q), p, S(?q)) }
>>
>>is entailed (simple entailment) by G. This leads to three distinct
>>solutions, S(?q) = a, S(?q) = X, and S(?q) = Y, because this three graphs
>>
>>{ (a, p, a), (X, p, Y), (X, p, X) }
>>{ (a, p, a), (X, p, Y), (Y, p, Y) }
>>{ (a, p, a), (X, p, Y) }
>>
>>are entailed by G. The problem (from my point of view) is that this example
>>shows is that the claim currently in section 5.2 in the specification
>>(rq24)
>>
>>"A pattern solution can then be defined as follows: to match a basic graph
>>pattern under simple entailment, it is possible to proceed by finding a
>>mapping from blank nodes and variables in the basic graph pattern to terms
>>in the graph being matched; a pattern solution is then a mapping
>>restricted to just the variables, possibly with blank nodes renamed. [...]"
>>
>>is not true. In my example, because Q does not have any blank node, the
>>only mapping that one finds from ?q to the terms of G is ?q -> a.

No, bnodes count as terms, so the terms in this 
example are a, X and Y , so the mappings one 
finds here are ?q->A, ?q->X and ?q->Y. Which is 
exactly what the definition gives one; so this 
does not seem to be a strict counterexample to 
the wording in the spec, at least on these 
grounds.

However, Enrico's point is well taken, as the 
usual algorithms in use do not simply 'find a 
mapping' as the above quote says (and whatever 
that means exactly :-), but instead perform 
*pattern matching* against the graph; and if one 
does that, then the only binding one finds is 
indeed the (?q-> a) case, since the pattern (?q P 
?q) does not match (X p Y). And I think indeed in 
this case that is exactly what should be the only 
answer, ie the pattern-matching wins over the 
current definition.

[Later: And the other example in 
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Aug/0005
makes the basic point more effectively.]

However, consider the following example: same query, but against the graph

{(a p a) (X p X) (X p b) }

Here, I suggest, the answers ?q->a, ?q->X would 
be correct, in spite of the apparent redundancy, 
because the third triple clearly distinguishes 
(what is known about) X from (what is known 
about) a; so the redundancy is indeed only 
apparent.

Pat


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 10 August 2006 06:25:49 UTC