- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Wed, 28 Mar 2007 13:46:46 +0100
- To: Lee Feigenbaum <feigenbl@us.ibm.com>
- CC: dawg mailing list <public-rdf-dawg@w3.org>
Lee Feigenbaum wrote:
> Andy Seaborne wrote on 03/26/2007 07:25:16 AM:
>> Steve Harris wrote:
>> ...
>>> Whether the distinct attribute should be set where appropriate is an
>>> interesting question. It also applies to SPARQL services that
>>> currently implicitly DISTINCT.
>> I don't see much use for a distinct attribute (I do see more utility for
> the
>> 'ordered').
>>
>> There never was anything stated about implicitly DISTINCT - I've always
> seen
>> it as a local API issue where the local API inserts (or has the effect
> of
>> inserting) DISTINCT into all queries. It was the case the test suite
>> carefully didn't distinguish - except we let such a test case in
>> which is what
>> started all this latest stuff into motion.
>
> Richard Newman has recently brought up this same issue on the -comments
> list. In preparing an answer for him, I looked at the specific text in
> 2.3.1 of the Query Results XML Format document:
>
> """
> The distinct attribute indicates that the results are distinct (contain no
> duplicates), such as given by a SPARQL query using SELECT DISTINCT.
> """
>
> To me, this suggests that distinct="true" is only a property of the
> results, and should be included whenever the results contain no
> duplicates, regardless of which--if any--keywords are present in the query
> itself. (I'm not thoroughly positive that this statement in the
> specification implies the opposite, "If the distinct attribute's value is
> false, then the results contain at least one duplicate", but it does seem
> that way to me.)
I have been reading this as saying @distinct=true implies "these results are
distinct" but the converse is unstated. @distinct=false means there are no
guarantees.
If false=> at least one duplicate then streaming of results is very hard,
often impossible. The code can't generate the header with @distinct until the
code has seen all the solutions or it knows there solutions will be distinct
anyway (e.g. SELECT * {?s ?p ?o})
>
> Do any implementations that we know about behave in this way? (Set
> distinct="true"/"false" solely based on the presence/absence of duplicates
> in the results.)
>
> Lee
Currently, ARQ sets @distinct=true if and only if the query had DISTINCT in
it. It's independent of the results and streaming happens.
I'd be happy to drop @distinct.
Andy
--
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Wednesday, 28 March 2007 12:46:59 UTC