# Re: definition of DISTINCT

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 22 Mar 2007 12:58:42 -0400
To: "Seaborne, Andy" <andy.seaborne@hp.com>

Message-ID: <20070322165842.GN4098@w3.org>
```* Seaborne, Andy <andy.seaborne@hp.com> [2007-03-16 18:25+0000]
>
>
>
> Eric Prud'hommeaux wrote:
> >Looking at the definition for DISTINCT in light of the INDISTINCT
> >proposal, I reallized that I wasn't sure whether
> >  {(X=<a>, Y=<b>)
> >   (X=<b>, Y=<a>)} had a duplicate. Also, "duplicates" wasn't really
> >defined.
>
> As per the WG decision, it is in the algebra.   Clearer statement here
> would be good.
>
> I expanded the section to make it more clear what the default

cool, though i didn't actually use it because i couldn't figure out
how to make my proposed text neither inaccurate nor misleading.

> >was and how DISTINCT acted.
> >
> >PROPOSED, replace 9.3 DISTINCT with this text:
> >
> >9.3 DISTINCT
> >
> >The solution sequence with no DISTINCT or INDISTINCT modifier is
> >defined by the SPARQL algebra in 12 Definition of SPARQL:
>
> The algebra covers all the solution modifiers and has a 'Distinct' operator
> so it defines SELECT and SELECT DISTINCT.  This seems to implicitly imply
> that DISTINCT is not covered by the algebra.

agreed. went with "<p>The solution sequence with no
<code>DISTINCT</code> or <code>REDUCED</code> modifier may include
duplicate solutions:</p>" as there are no other forward refs here.

> The algebra gives a definition of duplicate based on "same solution
> mapping" (= binding) = "same variable, same RDF term" that is, RDF term
> equivalence.

do you read that as agreeing with the sentence about what the DISTINCT
modifier does?

> >
> >  @prefix  foaf:  <http://xmlns.com/foaf/0.1/> .
> >
> >  _:x    foaf:name   "Alice" .
> >  _:x    foaf:mbox   <mailto:alice@example.com> .
> >
> >  _:y    foaf:name   "Alice" .
> >  _:y    foaf:mbox   <mailto:asmith@example.com> .
> >
> >  _:z    foaf:name   "Alice" .
> >  _:z    foaf:mbox   <mailto:alice.smith@example.com> .
> >
> >  PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
> >  SELECT ?name WHERE { ?x foaf:name ?name }
> >
> >  name
> >  "Alice"
> >  "Alice"
> >  "Alice"
> >
> >The DISTINCT solution modifier eliminates duplicate solutions.
> >Specifically, each solution that binds the same variables to the same
> >RDF terms as another solution is eliminated from the solution set.
>
> No duplicate bindings - yes, that better.
>
> >
> >  PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
> >  SELECT DISTINCT ?name WHERE { ?x foaf:name ?name }
> >
> >  name
> >  "Alice"
> >
> >If DISTINCT and LIMIT or OFFSET are specified, then duplicates are
> >eliminated before the limit or offset is applied.
>
> This is covered at the start of sec 9
> """Modifiers are applied in the order given by the list above."""
> Repeating does not harm but isn't necessary although then why don't all
> such sections have the appropriate similar text.

Editorially, if felt to me like it needed pointing out.
[[
Note that, per the order of solution sequence modifiers, duplicates
are eliminated before either limit or offset is applied.
]]
I am so commited to this text that I am asking you to disagree with me
before I'll strike it. Meaning, say the word and it's gone.
--
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than