- From: Steve Harris <steve.harris@garlik.com>
- Date: Tue, 9 Feb 2010 13:34:05 +0000
- To: Andy Seaborne <andy.seaborne@talis.com>
- Cc: Lee Feigenbaum <lee@thefigtrees.net>, "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
On 9 Feb 2010, at 11:27, Andy Seaborne wrote: > On 09/02/2010 10:29 AM, Lee Feigenbaum wrote: >> Steve Harris wrote: >>> On 9 Feb 2010, at 09:00, Andy Seaborne wrote: >>>> On 08/02/2010 10:23 AM, Steve Harris wrote: >>>>> http://www.w3.org/2009/sparql/track/issues/35 >>>>> Can aggregate functions take DISTINCT as an argument a la SELECT >>>>> COUNT(DISTINCT ?X)? >>>>> - Seems consensus on yes. >>>> >>>> A URI should name the function, not a collection of related >>>> functionality. >>>> >>>> Example: >>>> >>>> COUNT(DISTINCT ?x) vs COUNT(?x) >>>> >>>> How do you name the difference if they are not different URIs? >>> >>> In my view, DISTINCT does not change the function, it changes the >>> (multi)set that the function is applied to, c.f. >>> http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#aggregateAlgebra >>> >>> More concretely, you form a DISTINCT multiset of the bound values of >>> ?x, then apply the count function to the resulting set. >> >> FWIW, this is exactly how Glitter treats the DISTINCT modifier for >> both >> built-in and custom aggregates. It modified the set of solutions >> passed >> to the aggregate function. > > The defn in the doc applies it to the values of expressions of the > aggregate function (aside, so no seeing the expressions themselves, > only the result of after evaluation). > > If we have, in one partition: > > (?x=1, ?y=2) > (?x=1, ?y=3) > (?x=2, ?y=3) > > which is a set of solutions. > > I'd expect > COUNT(DISTINCT ?x) ==> 2 > COUNT(DISTINCT fn:floor((?x+1)/2)) ==> 1 Yes, if M = your solution multiset above. M' = M(fn:floor((?x+1)/2))) { 1, 1, 1 } M'' = DISTINCT M' { 1 } result = Count(M'') 1 This is how aggregates are defined in SQL, and I can't think of any pressing reason to depart from that. > which is applying the DISTINCT after the implicit projection (case > 1) and after expression evaluation (case 2). > > I thought Steve and I were mostly agreeing, except over whether one > can name the DISTINCT and non-DISTINCT versions with URIs. That's the essence I think. But maybe you're proposing something different? > And, maybe, the treatmeant of * - I prefer a treatment that passes > solutions and expressions to the aggregate so * is not different. I think the way it's expressed in SQL is quite neat. It means that COUNT(*) is a special case, but that's not a huge problem in my opinion. - Steve -- Steve Harris, Garlik Limited 2 Sheen Road, Richmond, TW9 1AE, UK +44 20 8973 2465 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Tuesday, 9 February 2010 13:34:38 UTC