Semantics necessary not sufficient (was: Re: What is "the serious bug in entailment semantics" found by J. Perez"?) from Pat Hayes on 2006-08-11 (public-rdf-dawg@w3.org from July to September 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 11 Aug 2006 13:30:01 -0700
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230935c10289f16d40@[192.168.1.6]>
>On 10 Aug 2006, at 08:25, Pat Hayes wrote:
>
>>>Basically the definition of "Basic Graph Pattern E-matching" 
>>><http://www.w3.org/2001/sw/DataAccess/rq23/rq24.html#BGPgeneral> 
>>>is not equivalent to the expected implementation as described in 
>>><http://www.w3.org/2001/sw/DataAccess/rq23/rq24.html#BGPsparql>.
>>
>>Actually I think it is, but the issue still needs to be discussed.
>
>No, it is not 'equivalent', but it is 'implied by'.

This depends on what one takes the implementation description to be 
saying, exactly. It is not entirely clear at present.

>Indeed, you are saying that:
>
>>This semantic condition then becomes a *necessary but possibly not 
>>sufficient* condition on an answer set; that is, it does not 
>>*define* the answer set, but merely imposes a semantic constraint 
>>upon it.
>
>I propose to fix the semantics to make the semantic definition 
>equivalent to the implementation hint. You propose to upgrade the 
>implementation hint to become a definition

Yes. It will need to be written more carefully and in a more formal 
style, of course.

>:
>
>>The answer set may be *defined* in some other way - for current 
>>purposes, by a matching algorithm - and the spec should then show 
>>that this definition does indeed conform to the semantic constraint.
>
>I oppose this, since the definition of answer set would not be a 
>semantic definition based on entailment anymore.

Indeed, it would not. However, there would be a semantic constraint 
based on entailment, which we could state first, and to which the 
definition of answer set would be required to conform; and I believe 
that this would be a better way to craft the design of the spec. 
Basically, my motivation is that it seems to me that getting an 
entailment definition exact, i.e. defining the answer set *exactly*, 
and *purely* in terms of entailment, is not a fruitful goal for us to 
be pursuing at this point in the state of this art. It does not seem 
to be necessary for SPARQL to do this. It will not be more precise 
than a procedural definition, only different in style. We have a good 
chance of getting it subtly wrong: after all, we already have done 
that several times. Even if we get it right, the result will likely 
be so opaque that almost all implementers of SPARQL engines will be 
obliged to use a simpler, more procedural, description as their 
actual guide. You have indicated that you, in fact, are already doing 
this: the goal of this exercise is to craft a semantic/entailment 
definition which will agree with a procedural description abstracted 
from current implementations. This is amounts to reverse-engineering 
a mathematical description from a procedural one, and I really don't 
see the point of doing this for the spec documents unless it is 
likely to produce some new insight or simplification or a better 
exposition; none of which seem likely at this point. If anything, the 
definitions we have been crafting simply make the basic idea of 
pattern-matching more and more opaque. I do not mean to argue against 
having a semantic analysis, in support of semantic interoperability: 
but interoperability is served by the answer set being (1) unique, 
and (2) semantically coherent. For the latter, it is not necessary 
that the entire answer set be *defined* semantically, only that the 
answers in it be required to *satisfy* appropriate semantic 
conditions.

>At this point why just not reduce the semantic definition to be just 
>G |= S(Q) ? This would be also a necessary but possibly not 
>sufficient condition on an answer set, but weaker than the current 
>one.

Although I presume you intended this to be a reductio ad absurdum, I 
actually think that for the purposes of writing the spec (not for 
doing science), keeping it that simple might have been a good 
strategy. Indeed, I have always been trying to keep these definitions 
as simple as possible, and I voted against accepting them in their 
present form, on the grounds that they are much too complicated to be 
useful. But there is value in the bnode scoping conditions contained 
in the current definition which I believe are worth preserving. The 
problems identified by Jorge (and many of the others that have 
arisen) are all concerned with how to appropriately limit redundancy 
in the answer set while facing up to the possibility of redundancy in 
the KB. And that issue, it seems to me, really is (a) not 
fundamentally to do with entailment at all, and (b) new, which does 
not arise in pre-RDF settings in the same way, and on which we should 
not be doing original research in this kind of a forum.

I have another motive for this suggestion. As you may have been able 
to figure out from my recent emails with Bijan, I am strongly in 
favor of allowing SPARQL to deliver redundant answers when it is 
dealing with a redundant KB, hence not requiring absolutely 
irredundant answers. On the other hand, it is clearly desireable to 
limit the unbounded amount of redundancy that a simple entailment 
condition would allow. Stating an appropriate compromise position 
between these extremes is  difficultwhen we limited to using only 
semantic notions and terminology, IMO largely because the very idea 
of redundancy here is 'semantically invisible', i.e. a redundant 
graph and its lean subgraphs are semantically indistinguishable. 
Hence the need to protect the entailment-based definitions with 
notions which really are not semantic at all, such as the scoping 
set. In contrast, these scoping ideas are trivial to express in an 
algorithmic framework, and the results are intuitively very clear and 
easy to understand. So I think that this way of dividing up the 
definitional work between a semantic necessity and a 
syntactic/algorithmic sufficiency might allow us to quickly and 
easily find a way to deal with redundancy in answers which will be 
more or less right for practical deployment.

>  I am looking for the strongest condition, i.e., the necessary and 
>sufficient. We already have one, we are working on it, give us time.

But if it isn't simpler than the current definition, we should think 
very seriously before including it in the spec as the basic normative 
definition of 'answer set'.

I do no doubt that it is possible. I think the style of definition 
that you came up with originally, involving 'reversible 
skolemization', would work.

Pat


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 11 August 2006 20:30:21 UTC