Re: grouping by expressions

On 2010-11-03, at 13:15, Andy Seaborne wrote:

>>> Having a variable created in algebra generation means that the XSD expression evaluation is untouched: everything happens inside a "group" algebra operation: definition of the group keys, calculation of aggregates.
>> 
>> This doesn't require an implicit variable, as far as I can see.
>> 
>> Aggregate() as it's written now in the draft can handle GROUP BY (expression) without an implicit variable.
> 
> I think you mean Group(), not Aggregate().
> 
> I wasn't mean that "group" - I should have chosen a different name. Your "Aggregation" would be closer but grouping does not always involve aggregation.
> 
> > If you just omit all mention of a variable in the spec text then it's clearer, no?
> 
> Not for me.
> The example in the WD is:
> 
> [[
> And so Aggregation((?y, ?z), ex:agg, {}, G) =
> 
> { (1) → eg:agg({(2, 3), (3, 4)}, {})), (2) → eg:agg({(5, 6)}, {}) }.
> ]]
> 
> Now if the query is (slight modification alert):
> 
>  SELECT (ex:agg(?y, ?z)+1 AS ?agg)
>  WHERE { ?x ?y ?z }
>  GROUP BY ?x.
> 
> how do we get from
> 
> { (1) → eg:agg({(2, 3), (3, 4)}, {})), (2) → eg:agg({(5, 6)}, {}) }.
> 
> to being able to calculate ex:agg(?y, ?z)+1  ?
> 
> ex:agg(?y, ?z)+1 is an expression - it needs a solution to calculate the "+".

Yeah, makes sense in Projection, but that doesn't apply to grouping, does it? "Naming" expressions is compulsory in SELECT already, as they have to be associated with some variable name in the result format.

> The easy way is to keep a clean separation of the group/aggregate process and the expression evaluation of SELECT expressions and HAVING.
> 
> Given:
> { (1) → eg:agg({(2, 3), (3, 4)}, {})), (2) → eg:agg({(5, 6)}, {}) }.
> 
> ## Not sure what the {} are.

Scalar args.

- Steve

> Let's take ex:agg to be MAX(?y), and evaluate it in Aggregation()
> 
> { (1) → MAX({(2, 3), (3, 4)})), (2) → MAX({(5, 6)}) }.
> =
> { (1) → 3, (2) → 5 }.
> 
> This is not a solution binding and can't be used in a expression.
> 
> Let's now take:
>  SELECT (MAX(?y)+1 AS ?agg)
> 
> Let's call, internally only, MAX(?y)=?N because solutions are a mapping from variables to values so we need a variable to associate the value of MAX(?y) with.
> 
> then
>  (MAX(?y)+1 AS ?agg)
> is translated to
>  extend (?N+1 AS ?agg)
> 
> the output of the group/aggregate step is a table (multiset of bindings):
> 
> ?x  MAX(?y)
> 1   3
> 2   5
> 
> Can't write MAX(?y) directly into "extend (?N+1 AS ?agg)"
> for several reasons:
> 
> 1/ MAX(?y) isn't function that results a single value.
> 
> 2/ ?y is out-of-scope by then
> 
> 3/ there isn't the information of a group key so can't look up the group key to get the value - can't get to the (1)
> 
> We could call the variable ?"MAX(?y)" but making it unique across the whole query seems easier and there could be another, different, MAX(?y).
> 
>> That the bit you have to be careful around. If you just omit all mention of a variable in the spec text then it's clearer, no?
> 
> ?N can't escape - the only ways out are via projection so either it's explicit names, including AS, and the original syntax of SELECT * further out in the query.
> 
> But define SELECT * during the algebra translation as only finding variables used in the syntax, and it can't be ?N.
> 
> (ARQ uses illegal variables names so it can easily determine the class of variable later - convenience, not need).
> 
> Greg - how do you do it?
> 
>  Andy
> 
> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Wednesday, 3 November 2010 16:07:31 UTC