W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > March 2007

Re: Unexpected DISTINCT?

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Tue, 27 Mar 2007 15:37:23 -0400
To: public-rdf-dawg-comments@w3.org, rnewman@franz.com
Message-ID: <OF9324A836.192E4755-ON852572AB.006B6022-852572AB.006BCAC6@us.ibm.com>

Richard Newman wrote on 02/25/2007 02:27:53 AM:

> 
> DAWG,
> 
>    I have an implementation question for which I cannot find an 
> answer in the spec.
> 
>    Given a SELECT query for which some results are duplicated, and 
> which does not specify DISTINCT, is it acceptable for an 
> implementation to return DISTINCT (or partially DISTINCT) results?
> 
>    This is exercised by <http://www.w3.org/2001/sw/DataAccess/tests/ 
> #modifer-limit>:
> 
> - with no DISTINCT processing, the results are [ 1, 1 ].
> - with DISTINCT processing, the results are [ 1, 2 ].

Thanks for bringing this to our attention. Please note that most of the 
tests in .../DataAccess/tests are not currently approved by the Working 
Group, and some of the tests explicitly do not reflect the specification. 
We are working to produce an updated and approved test suite at 
http://www.w3.org/2001/sw/DataAccess/tests/data-r2/ . This is still very 
much a work in progress, but will receive more attention with our recent 
publication of a Last Call working draft.

>    I seem to recall from informal sources that this is acceptable, 
> but it would be good to get a firm documented answer, particularly 
> when I can see that this could be contentious.

The algebra within Section 12 of the Last Call draft defines the precise 
cardinality given by combining SPARQL graph patterns and the cardinality 
for matching basic graph patterns. These cardinalities are preserved when 
the ToList() operation is applied to generate a solution sequence. The 
DISTINCT operator appears in the algebra and specifies the exact effect of 
the DISTINCT keyword on solution cardinalities within the solution 
sequence. 

Note that the REDUCED keyword is an at-risk feature in the current Last 
Call draft that can modify the cardinalities of solutions within the 
solution sequence. It is defined in 
http://www.w3.org/TR/rdf-sparql-query/#modReduced . 

In the absence of the DISTINCT or REDUCED keywords, the specification 
gives a precise cardinality for the solutions that appear in the solution 
sequence.

thanks,
Lee
 

> 
>    Thanks,
> 
> -R
> 
> 
Received on Tuesday, 27 March 2007 19:38:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:14:51 GMT