W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2007

Re: REDUCED and the SPARQL XML Results Format

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 28 Mar 2007 13:46:46 +0100
Message-ID: <460A63B6.3030202@hp.com>
To: Lee Feigenbaum <feigenbl@us.ibm.com>
CC: dawg mailing list <public-rdf-dawg@w3.org>



Lee Feigenbaum wrote:
> Andy Seaborne wrote on 03/26/2007 07:25:16 AM:
>> Steve Harris wrote:
>> ...
>>> Whether the distinct attribute should be set where appropriate is an 
>>> interesting question. It also applies to SPARQL services that 
>>> currently implicitly DISTINCT.
>> I don't see much use for a distinct attribute (I do see more utility for 
> the 
>> 'ordered').
>>
>> There never was anything stated about implicitly DISTINCT - I've always 
> seen 
>> it as a local API issue where the local API inserts (or has the effect 
> of 
>> inserting) DISTINCT into all queries.  It was the case the test suite 
>> carefully didn't distinguish - except we let such a test case in 
>> which is what 
>> started all this latest stuff into motion.
> 
> Richard Newman has recently brought up this same issue on the -comments 
> list. In preparing an answer for him, I looked at the specific text in 
> 2.3.1 of the Query Results XML Format document:
> 
> """
> The distinct attribute indicates that the results are distinct (contain no 
> duplicates), such as given by a SPARQL query using SELECT DISTINCT.
> """
> 
> To me, this suggests that distinct="true" is only a property of the 
> results, and should be included whenever the results contain no 
> duplicates, regardless of which--if any--keywords are present in the query 
> itself. (I'm not thoroughly positive that this statement in the 
> specification implies the opposite, "If the distinct attribute's value is 
> false, then the results contain at least one duplicate", but it does seem 
> that way to me.)

I have been reading this as saying @distinct=true implies "these results are 
distinct" but the converse is unstated.  @distinct=false means there are no 
guarantees.

If false=> at least one duplicate then streaming of results is very hard, 
often impossible.  The code can't generate the header with @distinct until the 
code has seen all the solutions or it knows there solutions will be distinct 
anyway (e.g. SELECT * {?s ?p ?o})

> 
> Do any implementations that we know about behave in this way? (Set 
> distinct="true"/"false" solely based on the presence/absence of duplicates 
> in the results.)
 >
 > Lee

Currently, ARQ sets @distinct=true if and only if the query had DISTINCT in 
it.  It's independent of the results and streaming happens.

I'd be happy to drop @distinct.

	Andy

-- 
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Wednesday, 28 March 2007 12:46:59 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:36 GMT