Re: Prioritised list of open issues (query, my bits) from Steve Harris on 2010-02-09 (public-rdf-dawg@w3.org from January to March 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Tue, 9 Feb 2010 13:34:05 +0000
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Message-Id: <2E665457-E467-4DB4-B7D5-7DB98B04F119@garlik.com>

On 9 Feb 2010, at 11:27, Andy Seaborne wrote:
> On 09/02/2010 10:29 AM, Lee Feigenbaum wrote:
>> Steve Harris wrote:
>>> On 9 Feb 2010, at 09:00, Andy Seaborne wrote:
>>>> On 08/02/2010 10:23 AM, Steve Harris wrote:
>>>>> http://www.w3.org/2009/sparql/track/issues/35
>>>>> Can aggregate functions take DISTINCT as an argument a la SELECT
>>>>> COUNT(DISTINCT ?X)?
>>>>> - Seems consensus on yes.
>>>>
>>>> A URI should name the function, not a collection of related
>>>> functionality.
>>>>
>>>> Example:
>>>>
>>>> COUNT(DISTINCT ?x) vs COUNT(?x)
>>>>
>>>> How do you name the difference if they are not different URIs?
>>>
>>> In my view, DISTINCT does not change the function, it changes the
>>> (multi)set that the function is applied to, c.f.
>>> http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#aggregateAlgebra
>>>
>>> More concretely, you form a DISTINCT multiset of the bound values of
>>> ?x, then apply the count function to the resulting set.
>>
>> FWIW, this is exactly how Glitter treats the DISTINCT modifier for  
>> both
>> built-in and custom aggregates. It modified the set of solutions  
>> passed
>> to the aggregate function.
>
> The defn in the doc applies it to the values of expressions of the  
> aggregate function (aside, so no seeing the expressions themselves,  
> only the result of after evaluation).
>
> If we have, in one partition:
>
> (?x=1, ?y=2)
> (?x=1, ?y=3)
> (?x=2, ?y=3)
>
> which is a set of solutions.
>
> I'd expect
>  COUNT(DISTINCT ?x) ==> 2
>  COUNT(DISTINCT fn:floor((?x+1)/2)) ==> 1

Yes,

if   M = your solution multiset above.
     M' = M(fn:floor((?x+1)/2)))  { 1, 1, 1 }
    M'' = DISTINCT M'             { 1 }
result = Count(M'')              1

This is how aggregates are defined in SQL, and I can't think of any  
pressing reason to depart from that.

> which is applying the DISTINCT after the implicit projection (case  
> 1) and after expression evaluation (case 2).
>
> I thought Steve and I were mostly agreeing, except over whether one  
> can name the DISTINCT and non-DISTINCT versions with URIs.

That's the essence I think. But maybe you're proposing something  
different?

> And, maybe, the treatmeant of * - I prefer a treatment that passes  
> solutions and expressions to the aggregate so * is not different.

I think the way it's expressed in SQL is quite neat. It means that  
COUNT(*) is a special case, but that's not a huge problem in my opinion.

- Steve

-- 
Steve Harris, Garlik Limited
2 Sheen Road, Richmond, TW9 1AE, UK
+44 20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  
9AD

Received on Tuesday, 9 February 2010 13:34:38 UTC