Re: bnodes as answer bindings from Bijan Parsia on 2006-08-07 (public-rdf-dawg@w3.org from July to September 2006)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Mon, 7 Aug 2006 21:08:37 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <ABF85EA5-F1AB-45E2-AF3F-2B0D6B5FD780@cs.man.ac.uk>
On Aug 7, 2006, at 8:40 PM, Pat Hayes wrote:

>> Slight emendation:
>>
>> On Aug 7, 2006, at 5:22 PM, Bijan Parsia wrote:
>> [snip]
>>
>>> """The answer set of a query is the largest set of query answers  
>>> that are entailed by the answer KB such that no answer in the set  
>>> is entailed by any other answer in the set."""
>>>
>>> Non-redundancy.
>> [snip]
>>
>> DQL distinguishes between the answer set and the response set:
>>
>> """Response Set
>> While there are no global requirements on a response set other  
>> than that all its members are correct answers, it is recommended  
>> that servers ensure that answer bundles do not contain duplicate  
>> or redundant answers, i.e., answers which are subsumed by other  
>> answers.  One answer subsumes another if its bindings are a  
>> superset of the bindings in the other answer.  Servers that are  
>> able to guarantee that their response sets contain no duplicate  
>> answers can be called non-repeating.  Servers that are able to  
>> guarantee that their response sets contain no duplicate or  
>> redundant answers can be called terse.  Servers that are able to  
>> guarantee that their response sets will be correctly terminated  
>> with 'none' can be called complete."""
>>
>> OWLQL (<http://ksl-web.stanford.edu/KSL_Abstracts/KSL-03-14.html>)  
>> as a more elaborate discussion.
>>
>> I think I prefer the way that SPARQL does it, if DISTINCT gets  
>> fixed. I certainly don't want to have the granularity of  
>> redundancy placed at the server level.
>
> I still think this is the best stance for the standard to adopt.

It's silly. We can easily do this on a query by query level and let  
servers do the best they can and communicate when they can't do better.

> I can see a perfectly good utility for servers which run fast but  
> do not *guarantee* non-redundancy.

So they should fault if the user requests it. Which I think they are  
by saying "DISTINCT".

> Im quite sanguine with this because the economic pressures on  
> servers and customers seem to converge on eliminating redundancy  
> where practicable: there is no motivation for anyone to  
> gratuitously introduce redundancy for no reason,

And yet they do. Plus it's not always gratuitious, but yet not  
desired. E.g., aggregation, or just multiple people entering data  
over time.

> only to save the considerable work involved in checking for non- 
> redundancy when an absolute guarantee of nonredundancy is not  
> required.

Then don't include the DISTINCT keyword and all conforming servers  
will behave as you like.

> I would vastly prefer to use such a server than one that times out  
> trying to establish a minimal answer set, particularly when we  
> might be talking about answer sets with high orders of magnitude.

Pat, I guess what you don't understand is that, as I've said several  
times now, it's perfectly reasonable to allow for (reasonable)  
redundancy with a plain select clause (this is how SQL works, I  
believe) but have a non-redundant answer set *WHEN THE USER REQUESTS  
IT*. And the natural way for a user to request it is with DISTINCT.  
But then we have to define what a non-redundant answer *is*. And in  
the standard conforming reading of the RDF Semantics, it's going to  
involve some work and cannot involve treating BNodes as denoting  
terms. At least, it would not be sensible to do so.

And you can get your desired behavior from *EVERY* SPARQL server by  
not including the DISTINCT. Why do you want to go shopping around for  
servers etc.

>> If I can't compute a non-redundant answer because I've run out of  
>> resources, I should timeout/fault with out of memory, whatever. If  
>> I have an imcomplete minimizer, I should be able to verify that  
>> that my answer set is minimal, or fault.
>
> *You* should, yes. That is, the user has the option of computing a  
> minimal answer set if it is absolutely required.

Obviously, I was speaking as a server. Frankly, I think this sort of  
behavior is exactly the sort of thing that should be standardize and  
in the query language. I sincerely doubt most users have the  
sophistication to get it right, and I don't see why they should have to.

*Who* was going on about putting burdens on the implementers instead  
of the users? This is a clear case of a ridiculous burden on the  
users. And there's a nice optout for the implementers: Don't support  
DISTINCT and advertize that.

[snip]

Cheers,
Bijan.
Received on Monday, 7 August 2006 20:08:04 UTC