Re: Semantics necessary not sufficient (was: Re: What is "the serious bug in entailment semantics" found by J. Perez"?) from Pat Hayes on 2006-08-13 (public-rdf-dawg@w3.org from July to September 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 12 Aug 2006 21:01:40 -0700
To: Bijan Parsia <bparsia@cs.man.ac.uk>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p0623093bc1045066f08a@[192.168.1.6]>
>On Aug 11, 2006, at 9:30 PM, Pat Hayes wrote:
>[snip]
>>Indeed, it would not. However, there would be a semantic constraint 
>>based on entailment, which we could state first, and to which the 
>>definition of answer set would be required to conform; and I 
>>believe that this would be a better way to craft the design of the 
>>spec. Basically, my motivation is that it seems to me that getting 
>>an entailment definition exact, i.e. defining the answer set 
>>*exactly*, and *purely* in terms of entailment, is not a fruitful 
>>goal for us to be pursuing at this point in the state of this art. 
>>It does not seem to be necessary for SPARQL to do this. It will not 
>>be more precise than a procedural definition, only different in 
>>style. We have a good chance of getting it subtly wrong: after all, 
>>we already have done that several times. Even if we get it right, 
>>the result will likely be so opaque that almost all implementers of 
>>SPARQL engines will be obliged to use a simpler, more procedural, 
>>description as their actual guide. You have indicated that you, in 
>>fact, are already doing this: the goal of this exercise is to craft 
>>a semantic/entailment definition which will agree with a procedural 
>>description abstracted from current implementations. This is 
>>amounts to reverse-engineering a mathematical description from a 
>>procedural one, and I really don't see the point of doing this for 
>>the spec documents unless it is likely to produce some new insight 
>>or simplification or a better exposition; none of which seem likely 
>>at this point.
>
>For what it's worth, once I hacked through the horrible exposition 
>in the current document (with help from Enrico and Sergio) for my 
>tutorial, there were some interesting insights (as you mention 
>below, scoping sets are cool). The definition of an E-Entailment 
>regime is similarly useful I think.

Note, Im not suggesting tossing all this, only weakening its role in 
the spec normativity, form Nec&Suff to simply Nec.

>Of course, their major use is in definitions of alternative 
>semantics. I think this point should not be put aside. Even within 
>the bounds of our charter, we have simple, rdf, rdfs, and let's call 
>it "assertional"/matching (possible) entailment relations to 
>consider. I think the "virtual graph"/deductive closure approach 
>suffers from a number of problems

It's not universal, but I think it has many merits in the cases where 
it applies. The fact that it does apply is itself a kind of badge of 
computational simplicity that can be worn with pride for some 
applications, and I gather (to my great surprise) that for RDFS at 
least, it is in fact a computationally feasible way to proceed, even 
on quite large KBs. I don't think we should entirely trash it for the 
simpler entailment cases.

>, and prefer an approach which refers to the semantics of the 
>relations, which opens the door for extensions to OWL. Or I prefer 
>an approach where each is defined distinctly (though I prefer that 
>less than the general approach).
>
>>If anything, the definitions we have been crafting simply make the 
>>basic idea of pattern-matching more and more opaque. I do not mean 
>>to argue against having a semantic analysis, in support of semantic 
>>interoperability: but interoperability is served by the answer set 
>>being (1) unique, and (2) semantically coherent. For the latter, it 
>>is not necessary that the entire answer set be *defined* 
>>semantically, only that the answers in it be required to *satisfy* 
>>appropriate semantic conditions.
>
>This is true. But we want some guarantees that the procedural 
>definition yields the semantically correct answers.

Agreed. We would need something close to a formal proof of this in the spec.

>[snip]
>>I have another motive for this suggestion. As you may have been 
>>able to figure out from my recent emails with Bijan, I am strongly 
>>in favor of allowing SPARQL to deliver redundant answers when it is 
>>dealing with a redundant KB, hence not requiring absolutely 
>>irredundant answers.
>
>Though these should, IMHO, be available from the language.

If we can possibly do this, I agree. But as a last resort, a spec 
without this is better than no spec, or a spec so hard to follow that 
nobody reads it.

>Otherwise, it's very hard to say that we are truly doing RDF query. 
>Of course, there are cases where you want access to the 
>non-redundant graph *per se* (e.g., editors), but I'm not as 
>sanguine about that as i use to be. Browsers (in the sense of 
>portals) and editors are very different. Thus, I believe the problem 
>of ensuring bnode stability thoughout a "session" and getting 
>exactly the redundancy in a graph should be separated.

They seem to be related, though. The two cases in the first problem 
(session scope vs. answer document scope, AKA told bnodes vs. no told 
bnodes) seem to suggest different notions of redundancy, is what 
bothers me. See the final example in my message to Jorge to make the 
point. If the bnodes have document scope, and cannot be referred to 
again, then ?x/A , ?x/_:b seems clearly redundant. But if that can be 
a told bnode, you may be able to find out more about it than you can 
about A. And by phrasing everything so that either option is possible 
(which I think is good, BTW, and would not like to go back on), we 
have kind of shot ourselves in the foot as far as being able to 
decide on one of these as being THE single correct way is concerned, 
seems to me. Or perhaps, only the second option is feasible, so that 
there can be answer sets which *seem* redundant when you don't have 
told bnodes (but aren't, er, really).

>In fact, we should only use the latter to allow for specifying a 
>standard for "acceptable" redundancy that is consistent across 
>implementations (as Pat discusses below).
>
>>On the other hand, it is clearly desireable to limit the unbounded 
>>amount of redundancy that a simple entailment condition would 
>>allow. Stating an appropriate compromise position between these 
>>extremes is  difficultwhen we limited to using only semantic 
>>notions and terminology, IMO largely because the very idea of 
>>redundancy here is 'semantically invisible', i.e. a redundant graph 
>>and its lean subgraphs are semantically indistinguishable. Hence 
>>the need to protect the entailment-based definitions with notions 
>>which really are not semantic at all, such as the scoping set. In 
>>contrast, these scoping ideas are trivial to express in an 
>>algorithmic framework, and the results are intuitively very clear 
>>and easy to understand. So I think that this way of dividing up the 
>>definitional work between a semantic necessity and a 
>>syntactic/algorithmic sufficiency might allow us to quickly and 
>>easily find a way to deal with redundancy in answers which will be 
>>more or less right for practical deployment.
>[snip]
>
>I do think that this should be *available* and the *default*, but i 
>think too that if we don't make the irredundent answer sets 
>available (if only by special user demand) then we aren't cohering 
>with the semantics of RDF.

Hmm. Not sure I agree with the "semantics of RDF" point exactly, but 
never mind. OK, lets say I agree, provisionally :-). Lets move to a 
discussion of exactly how to do this using DISTINCT, and therefore 
exactly what it means (see above).

>I'll raise an issue and explain the point.

Sounds good.

Pat

>Cheers,
>Bijan.


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 13 August 2006 04:02:06 UTC