- From: Lee Feigenbaum <feigenbl@us.ibm.com>
- Date: Mon, 5 Mar 2007 00:27:03 -0500
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
As per my action from last week, I sought Steve and Jeen's opinions on auto distinct. Here's what Steve had to say: """ I would very much like the freedom to return the number of duplicates that's easiest. I use it as an optimisation in some places. eg. I can answer SELECT ?g ?x WHERE { GRAPH ?g { ?x ?y ?z } } with an implicit DISTINCT very cheaply. There are also a few cases where I auto DISTINCT or auto semi- DISTINCT to keep the number of sultions down, eg SELECT ?g WHERE { GRAPH ?g { ?x ?y ?z } } The natural return for my system would be one row per quad in the store. """ Lee Andy Seaborne wrote on 02/26/2007 11:03:08 AM: > > The modifier order is: > > * 9.1 ORDER BY > * 9.2 Projection > * 9.3 DISTINCT > * 9.4 OFFSET > * 9.5 LIMIT > > so the test is correct. > > We don't document anywhere (IIRC) anything about auto DISTINCT. > > When DISTINCT is applied after ORDER. The ORDER step emits [1, 1, 2, ...] so > > limit( > distinct([1, 1, 2, ...]), > 2) > = [1, 2] > > > The question of implicit DISTINCT remains - > > Any opinions of saying anything about implicit DISTINCT for simple entailment > (all we define SPARQL for). Because DISTINCT is after projection, there are > several ways to get duplicates, all of which are well-defined within BGP > matching (blank nodes for simple entailment matches), the algebra > (UNION), and > projection. Just projection alone suggests to be that we should not define > implicit DISTINCT and leave it to implementations to provide as an > extra but I > don't have a strong opinion to that effect. > > Andy > > -------- Original Message -------- > Subject: Unexpected DISTINCT? > Resent-Date: Sun, 25 Feb 2007 17:58:17 +0000 > Resent-From: public-rdf-dawg-comments@w3.org > Date: Sat, 24 Feb 2007 23:27:53 -0800 > From: Richard Newman <rnewman@franz.com> > To: public-rdf-dawg-comments@w3.org > > > DAWG, > > I have an implementation question for which I cannot find an > answer in the spec. > > Given a SELECT query for which some results are duplicated, and > which does not specify DISTINCT, is it acceptable for an > implementation to return DISTINCT (or partially DISTINCT) results? > > This is exercised by <http://www.w3.org/2001/sw/DataAccess/tests/ > #modifer-limit>: > > - with no DISTINCT processing, the results are [ 1, 1 ]. > - with DISTINCT processing, the results are [ 1, 2 ]. > > I seem to recall from informal sources that this is acceptable, > but it would be good to get a firm documented answer, particularly > when I can see that this could be contentious. > > Thanks, > > -R > > > >
Received on Monday, 5 March 2007 05:27:56 UTC