- From: Lee Feigenbaum <feigenbl@us.ibm.com>
- Date: Mon, 5 Mar 2007 00:27:03 -0500
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
As per my action from last week, I sought Steve and Jeen's opinions on
auto distinct. Here's what Steve had to say:
"""
I would very much like the freedom to return the number of duplicates
that's easiest. I use it as an optimisation in some places. eg. I can
answer
SELECT ?g ?x WHERE { GRAPH ?g { ?x ?y ?z } }
with an implicit DISTINCT very cheaply.
There are also a few cases where I auto DISTINCT or auto semi-
DISTINCT to keep the number of sultions down, eg
SELECT ?g WHERE { GRAPH ?g { ?x ?y ?z } }
The natural return for my system would be one row per quad in the store.
"""
Lee
Andy Seaborne wrote on 02/26/2007 11:03:08 AM:
>
> The modifier order is:
>
> * 9.1 ORDER BY
> * 9.2 Projection
> * 9.3 DISTINCT
> * 9.4 OFFSET
> * 9.5 LIMIT
>
> so the test is correct.
>
> We don't document anywhere (IIRC) anything about auto DISTINCT.
>
> When DISTINCT is applied after ORDER. The ORDER step emits [1, 1, 2,
...] so
>
> limit(
> distinct([1, 1, 2, ...]),
> 2)
> = [1, 2]
>
>
> The question of implicit DISTINCT remains -
>
> Any opinions of saying anything about implicit DISTINCT for simple
entailment
> (all we define SPARQL for). Because DISTINCT is after projection, there
are
> several ways to get duplicates, all of which are well-defined within BGP
> matching (blank nodes for simple entailment matches), the algebra
> (UNION), and
> projection. Just projection alone suggests to be that we should not
define
> implicit DISTINCT and leave it to implementations to provide as an
> extra but I
> don't have a strong opinion to that effect.
>
> Andy
>
> -------- Original Message --------
> Subject: Unexpected DISTINCT?
> Resent-Date: Sun, 25 Feb 2007 17:58:17 +0000
> Resent-From: public-rdf-dawg-comments@w3.org
> Date: Sat, 24 Feb 2007 23:27:53 -0800
> From: Richard Newman <rnewman@franz.com>
> To: public-rdf-dawg-comments@w3.org
>
>
> DAWG,
>
> I have an implementation question for which I cannot find an
> answer in the spec.
>
> Given a SELECT query for which some results are duplicated, and
> which does not specify DISTINCT, is it acceptable for an
> implementation to return DISTINCT (or partially DISTINCT) results?
>
> This is exercised by <http://www.w3.org/2001/sw/DataAccess/tests/
> #modifer-limit>:
>
> - with no DISTINCT processing, the results are [ 1, 1 ].
> - with DISTINCT processing, the results are [ 1, 2 ].
>
> I seem to recall from informal sources that this is acceptable,
> but it would be good to get a firm documented answer, particularly
> when I can see that this could be contentious.
>
> Thanks,
>
> -R
>
>
>
>
Received on Monday, 5 March 2007 05:27:56 UTC