# Re: definition of DISTINCT

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 16 Mar 2007 18:25:51 +0000
Message-ID: <45FAE12F.1090606@hp.com>
To: Eric Prud'hommeaux <eric@w3.org>

```

Eric Prud'hommeaux wrote:
> Looking at the definition for DISTINCT in light of the INDISTINCT
> proposal, I reallized that I wasn't sure whether
>   {(X=<a>, Y=<b>)
>    (X=<b>, Y=<a>)} had a duplicate. Also, "duplicates" wasn't really
> defined.

As per the WG decision, it is in the algebra.   Clearer statement here would
be good.

I expanded the section to make it more clear what the default
> was and how DISTINCT acted.
>
> PROPOSED, replace 9.3 DISTINCT with this text:
>
> 9.3 DISTINCT
>
> The solution sequence with no DISTINCT or INDISTINCT modifier is
> defined by the SPARQL algebra in 12 Definition of SPARQL:

The algebra covers all the solution modifiers and has a 'Distinct' operator so
it defines SELECT and SELECT DISTINCT.  This seems to implicitly imply that
DISTINCT is not covered by the algebra.

The algebra gives a definition of duplicate based on "same solution mapping"
(= binding) = "same variable, same RDF term" that is, RDF term equivalence.

>
>   @prefix  foaf:  <http://xmlns.com/foaf/0.1/> .
>
>   _:x    foaf:name   "Alice" .
>   _:x    foaf:mbox   <mailto:alice@example.com> .
>
>   _:y    foaf:name   "Alice" .
>   _:y    foaf:mbox   <mailto:asmith@example.com> .
>
>   _:z    foaf:name   "Alice" .
>   _:z    foaf:mbox   <mailto:alice.smith@example.com> .
>
>   PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
>   SELECT ?name WHERE { ?x foaf:name ?name }
>
>   name
>   "Alice"
>   "Alice"
>   "Alice"
>
> The DISTINCT solution modifier eliminates duplicate solutions.
> Specifically, each solution that binds the same variables to the same
> RDF terms as another solution is eliminated from the solution set.

No duplicate bindings - yes, that better.

>
>   PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
>   SELECT DISTINCT ?name WHERE { ?x foaf:name ?name }
>
>   name
>   "Alice"
>
> If DISTINCT and LIMIT or OFFSET are specified, then duplicates are
> eliminated before the limit or offset is applied.

This is covered at the start of sec 9
"""Modifiers are applied in the order given by the list above."""
Repeating does not harm but isn't necessary although then why don't all such
sections have the appropriate similar text.

Andy
```
Received on Friday, 16 March 2007 18:26:21 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:00:53 UTC