W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > October to December 2010

Re: grouping by expressions

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 3 Nov 2010 12:25:48 +0000
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <5A2EC832-1E6C-47DB-A18D-A739AB4BEB8B@garlik.com>
To: Andy Seaborne <andy.seaborne@epimorphics.com>
On 2010-11-03, at 11:24, Andy Seaborne wrote:
> On 03/11/10 10:59, Steve Harris wrote:
...
>> I don't think we should /require/ AS if we add this syntax, there are situations where you want to group by an expression, but don't need to assign it to a variable, e.g.:
>> 
>> SELECT (AVG(?time) AS ?centre) (COUNT(*) AS ?magnitude)
>> WHERE {
>>    ?x a<Impulse>  ;
>>       <timestamp>  ?time .
>> }
>> GROUP BY round(?time * 1000)
>> 
>> Would seem a bit strange to have to write GROUP BY (round(?time * 1000) AS ?notneeded).
> 
> Fine by me, and seems to make other things work out naturally: have:
> 
>  HAVING (COUNT(*) > 0)
> 
> then that's much cleaner to handle with an implicit variable for the COUNT(*) as it can be used multiple times in the same aggregation step.  While it's possible to define it so the aggregation happens multiple times, expression evaluation would need to be updated to know about aggregation functions.
> 
> Having a variable created in algebra generation means that the XSD expression evaluation is untouched: everything happens inside a "group" algebra operation: definition of the group keys, calculation of aggregates.

This doesn't require an implicit variable, as far as I can see.

Aggregate() as it's written now in the draft can handle GROUP BY (expression) without an implicit variable.

> So for:
> 
>  SELECT (COUNT(*) AS ?c) (2*COUNT(*) AS ?c2) (1/COUNT(*) AS ?d)
>  {...}
>  HAVING (COUNT(*) > 0)
> 
> Allocate a new variable and assign the aggregation calculation once to that one new variable.  Rewrite all the expressions to use that var.

That would be a potential optimisation I think. I don't see a need to specify that implementations do that. The results should be the same regardless of whether you do the common subexpression optimisation or not.

> Such a variable can't escape and be visible to the results without having been given a legal name via AS.

That the bit you have to be careful around. If you just omit all mention of a variable in the spec text then it's clearer, no?

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 3 November 2010 12:26:23 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:44 GMT