Re: Semantics necessary not sufficient (was: Re: What is "the serious bug in entailment semantics" found by J. Perez"?) from Bijan Parsia on 2006-08-14 (public-rdf-dawg@w3.org from July to September 2006)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Mon, 14 Aug 2006 04:32:02 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <4978133E-8290-46A3-B248-0D7EF2BCDD80@cs.man.ac.uk>
On Aug 13, 2006, at 5:01 AM, Pat Hayes wrote:

>> On Aug 11, 2006, at 9:30 PM, Pat Hayes wrote:
[snip]
>> Of course, their major use is in definitions of alternative  
>> semantics. I think this point should not be put aside. Even within  
>> the bounds of our charter, we have simple, rdf, rdfs, and let's  
>> call it "assertional"/matching (possible) entailment relations to  
>> consider. I think the "virtual graph"/deductive closure approach  
>> suffers from a number of problems
>
> It's not universal, but I think it has many merits in the cases  
> where it applies. The fact that it does apply is itself a kind of  
> badge of computational simplicity that can be worn with pride for  
> some applications, and I gather (to my great surprise) that for  
> RDFS at least, it is in fact a computationally feasible way to  
> proceed, even on quite large KBs. I don't think we should entirely  
> trash it for the simpler entailment cases.

Whatever approach we sanction should be clearly laid out. Virtual  
graphs are, in fact, rather tricky things, for example, in the RDFS  
case. (And I've built and worked with such, i.e., forward chaining  
RDFS systems.) Basically, they are all incomplete, and typically a  
bit suspect. None of them, afaik, treat BNodes existentially. (And  
how should they deal with redundancy? The story needs to be told  
there since we'll be treating certain semantic conditions  
differently, at least than a naive approach.)

I think this all goes hand in hand with being able to specify which  
under which semantics you intend your query to be interpreted. We can  
provide a "hooky" framework that doesn't necessarily have a unified  
semantics across every variant. Though if we can maintain the BGP  
matching/algebra split, things will be *much* easier.

>> , and prefer an approach which refers to the semantics of the  
>> relations, which opens the door for extensions to OWL. Or I prefer  
>> an approach where each is defined distinctly (though I prefer that  
>> less than the general approach).
>>
>>> If anything, the definitions we have been crafting simply make  
>>> the basic idea of pattern-matching more and more opaque. I do not  
>>> mean to argue against having a semantic analysis, in support of  
>>> semantic interoperability: but interoperability is served by the  
>>> answer set being (1) unique, and (2) semantically coherent. For  
>>> the latter, it is not necessary that the entire answer set be  
>>> *defined* semantically, only that the answers in it be required  
>>> to *satisfy* appropriate semantic conditions.
>>
>> This is true. But we want some guarantees that the procedural  
>> definition yields the semantically correct answers.
>
> Agreed. We would need something close to a formal proof of this in  
> the spec.

I'm glad we agree on this key point.

>> [snip]
>>> I have another motive for this suggestion. As you may have been  
>>> able to figure out from my recent emails with Bijan, I am  
>>> strongly in favor of allowing SPARQL to deliver redundant answers  
>>> when it is dealing with a redundant KB, hence not requiring  
>>> absolutely irredundant answers.
>>
>> Though these should, IMHO, be available from the language.
>
> If we can possibly do this, I agree. But as a last resort, a spec  
> without this is better than no spec, or a spec so hard to follow  
> that nobody reads it.

Well, the functionality and how it's presented are somewhat  
orthogonal. Non-redundancy of answers can be specified in the section  
on distinct as a post processing (I believe). That seems easy enough  
to understand.

>> Otherwise, it's very hard to say that we are truly doing RDF  
>> query. Of course, there are cases where you want access to the non- 
>> redundant graph *per se* (e.g., editors), but I'm not as sanguine  
>> about that as i use to be. Browsers (in the sense of portals) and  
>> editors are very different. Thus, I believe the problem of  
>> ensuring bnode stability thoughout a "session" and getting exactly  
>> the redundancy in a graph should be separated.
>
> They seem to be related, though. The two cases in the first problem  
> (session scope vs. answer document scope, AKA told bnodes vs. no  
> told bnodes) seem to suggest different notions of redundancy, is  
> what bothers me. See the final example in my message to Jorge to  
> make the point. If the bnodes have document scope, and cannot be  
> referred to again, then ?x/A , ?x/_:b seems clearly redundant. But  
> if that can be a told bnode, you may be able to find out more about  
> it than you can about A.

As in my :mochi example in my DISTINCT issue raising. Clearly the  
algebra can introduce redundancy that the graph didn't have. In the  
context of a single query, I think it's correct to eliminate that  
introduced redundancy (in the context of a DISTINCT) since, really,  
that's what you asked for. I.e., consider:

	SELECT DISTINCT ?x ?y
		{?x ?p ?y}

	:bob :loves :mochi.
	:bob :hates :mochi.

The redundant answer set is:
	?x		?y
	:bob	:mochi
	:bob	:mochi

And I think having that as a result of DISTINCT is enormously  
counterintuitive. So I think we can generalize it to the BNode case,  
at least in some mode. (I'm arguing for inclusion.) If you don't have  
BNode stability, then it's all moot anyway. You could always recover  
distinct answers (in the mochi cases) by projecting an additional  
variable.

> And by phrasing everything so that either option is possible (which  
> I think is good, BTW, and would not like to go back on), we have  
> kind of shot ourselves in the foot as far as being able to decide  
> on one of these as being THE single correct way is concerned, seems  
> to me. Or perhaps, only the second option is feasible, so that  
> there can be answer sets which *seem* redundant when you don't have  
> told bnodes (but aren't, er, really).

Here's a problem, in general, I have about told BNodes, they are not  
really part of RDF as specified. They *are* part of RDF as practiced.  
But part of our job is to make specs cohere and to make things,  
generally, less confusing than more. So, I would prefer to tackle  
this head on rather than slide around it. Tackling it head on, imho,  
means either respecting the current semantics, or *changing* them.  
Now we aren't chartered to change them, so I would like to request  
that the Chair talk to the coordination group to see what possible  
options are.

>> In fact, we should only use the latter to allow for specifying a  
>> standard for "acceptable" redundancy that is consistent across  
>> implementations (as Pat discusses below).
>>
>>> On the other hand, it is clearly desireable to limit the  
>>> unbounded amount of redundancy that a simple entailment condition  
>>> would allow. Stating an appropriate compromise position between  
>>> these extremes is  difficultwhen we limited to using only  
>>> semantic notions and terminology, IMO largely because the very  
>>> idea of redundancy here is 'semantically invisible', i.e. a  
>>> redundant graph and its lean subgraphs are semantically  
>>> indistinguishable. Hence the need to protect the entailment-based  
>>> definitions with notions which really are not semantic at all,  
>>> such as the scoping set. In contrast, these scoping ideas are  
>>> trivial to express in an algorithmic framework, and the results  
>>> are intuitively very clear and easy to understand. So I think  
>>> that this way of dividing up the definitional work between a  
>>> semantic necessity and a syntactic/algorithmic sufficiency might  
>>> allow us to quickly and easily find a way to deal with redundancy  
>>> in answers which will be more or less right for practical  
>>> deployment.
>> [snip]
>>
>> I do think that this should be *available* and the *default*, but  
>> i think too that if we don't make the irredundent answer sets  
>> available (if only by special user demand) then we aren't cohering  
>> with the semantics of RDF.
>
> Hmm. Not sure I agree with the "semantics of RDF" point exactly,  
> but never mind.

Let me put it this way, either redundancy is semantically  
significant, or it isn't. (Or significant in some cases and not in  
others.) There's one story where redundancy is like redundancy in  
SQL, that is, a *purely* computational hack/convenience/whathaveyou.  
DISTINCT gives you the "real" answer set, in some sense (at least  
from the POV of the underlying formalism). So, if we treat BNodes as  
told systematically throughout the processing of a query (and even  
unto separate queries), always, and only, then we are treating  
redundancy which is *not* significant, by the semantics of the  
underlying formalism, as very significant.

I think it would be a legitimate objection to the spec if that were  
the case. I think if the group wants to do that, then it should face  
the tension with the RDF Semantics document head on.

(Personally, I would prefer that BNodes were *not* existential  
variables, but local names that have specific renaming conventions.  
Most software wouldn't have to change at all. It would cohere with  
most actual use and treatment of BNodes. It would be computationally  
saner and easier to mesh with deductive database systems. Etc. But I  
don't see that we just get to *do* that in SPARQL.)

> OK, lets say I agree, provisionally :-). Lets move to a discussion  
> of exactly how to do this using DISTINCT, and therefore exactly  
> what it means (see above).
>
>> I'll raise an issue and explain the point.
>
> Sounds good.

See my other message.

Cheers,
Bijan.
Received on Monday, 14 August 2006 03:32:19 UTC