Re: [Fwd: Unexpected DISTINCT?] from Lee Feigenbaum on 2007-03-05 (public-rdf-dawg@w3.org from January to March 2007)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Mon, 5 Mar 2007 00:27:03 -0500
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <OF109DCC77.6F5E3964-ON85257295.001DDBD9-85257295.001DF0F7@us.ibm.com>

As per my action from last week, I sought Steve and Jeen's opinions on 
auto distinct. Here's what Steve had to say:

"""
I would very much like the freedom to return the number of duplicates 
that's easiest. I use it as an optimisation in some places. eg. I can 
answer
    SELECT ?g ?x WHERE { GRAPH ?g { ?x ?y ?z } }
with an implicit DISTINCT very cheaply.

There are also a few cases where I auto DISTINCT or auto semi- 
DISTINCT to keep the number of sultions down, eg
    SELECT ?g WHERE { GRAPH ?g { ?x ?y ?z } }
The natural return for my system would be one row per quad in the store.
"""

Lee

Andy Seaborne wrote on 02/26/2007 11:03:08 AM:

> 
> The modifier order is:
> 
>      * 9.1 ORDER BY
>      * 9.2 Projection
>      * 9.3 DISTINCT
>      * 9.4 OFFSET
>      * 9.5 LIMIT
> 
> so the test is correct.
> 
> We don't document anywhere (IIRC) anything about auto DISTINCT.
> 
> When DISTINCT is applied after ORDER.  The ORDER step emits [1, 1, 2, 
...] so
> 
> limit(
>     distinct([1, 1, 2, ...]),
>     2)
>     = [1, 2]
> 
> 
> The question of implicit DISTINCT remains -
> 
> Any opinions of saying anything about implicit DISTINCT for simple 
entailment 
> (all we define SPARQL for).  Because DISTINCT is after projection, there 
are 
> several ways to get duplicates, all of which are well-defined within BGP 

> matching (blank nodes for simple entailment matches), the algebra 
> (UNION), and 
> projection.  Just projection alone suggests to be that we should not 
define 
> implicit DISTINCT and leave it to implementations to provide as an 
> extra but I 
> don't have a strong opinion to that effect.
> 
>    Andy
> 
> -------- Original Message --------
> Subject: Unexpected DISTINCT?
> Resent-Date: Sun, 25 Feb 2007 17:58:17 +0000
> Resent-From: public-rdf-dawg-comments@w3.org
> Date: Sat, 24 Feb 2007 23:27:53 -0800
> From: Richard Newman <rnewman@franz.com>
> To: public-rdf-dawg-comments@w3.org
> 
> 
> DAWG,
> 
>     I have an implementation question for which I cannot find an
> answer in the spec.
> 
>     Given a SELECT query for which some results are duplicated, and
> which does not specify DISTINCT, is it acceptable for an
> implementation to return DISTINCT (or partially DISTINCT) results?
> 
>     This is exercised by <http://www.w3.org/2001/sw/DataAccess/tests/
> #modifer-limit>:
> 
> - with no DISTINCT processing, the results are [ 1, 1 ].
> - with DISTINCT processing, the results are [ 1, 2 ].
> 
>     I seem to recall from informal sources that this is acceptable,
> but it would be good to get a firm documented answer, particularly
> when I can see that this could be contentious.
> 
>     Thanks,
> 
> -R
> 
> 
> 
>

Received on Monday, 5 March 2007 05:27:56 UTC