Re: Order of evaluation for aggregates from Andy Seaborne on 2011-11-28 (public-rdf-dawg@w3.org from October to December 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 28 Nov 2011 12:20:34 +0000
To: birte.glimm@uni-ulm.de
CC: Steve Harris <steve.harris@garlik.com>, sparql Working Group <public-rdf-dawg@w3.org>
Message-ID: <4ED37C92.9000009@epimorphics.com>
On 28/11/11 10:28, Birte Glimm wrote:
> On 26 November 2011 23:59, Andy Seaborne<andy.seaborne@epimorphics.com>  wrote:
>> Steve,
>>
>> I'm working through the definitions as they are in rq25 at the moment (Nov
>> 26).
>>
>> I see no problem extending to ORDER BY : it works on ?agg_i and they are
>> in-scope.
>
> Agreed.
>
>> ## are comments
>> ** are suggestions
>>
>> Q = SELECT ?x (1+count(*) as ?y) WHERE { ?x :p ?v } GROUP BY ?x
>> P = BGP({ ?x :p ?v })
>>
>>> If Q contains GROUP BY exprlist
>>>     Let G := Group(exprlist, P)
>>> Else
>>>     Let G := Group((1), P)
>>>     End
>>
>> ## What about the case of no GROUP BY and no aggregate?
>>    This catchall always groups a query
>> ** ---------
>> If Q contains GROUP BY exprlist
>>    Let G := Group(exprlist, P)
>> Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
>>    Let G := Group((1), P)
>> Else
>>    skip the rest of the aggregate step
>>    End
>> ** ---------
>
> Good point.
>
>> G := Group(?x, BGP({ ?x :p ?v }))
>> i:=1
>>
>>> For each (X AS Var) in SELECT and each HAVING(X) in Q
>> so
>>   X=1+count(*) Var = ?y
>>
>>> If X contains an unaggregated variable V
>>
>> ** s/Var/V/ in the For loop above.
>
> No. We arelooking at (X AS Var), but we now want to know whether X
> contains a variable that is not aggregated. Obviously, variables in X
> are different from Var. This is meant to handle things like (1+?x AS
> ?y), where ?x is grouped, but not aggregated. This should become
> (1+SAMPLE(?x) AS ?y).

OK - I understand now.

>
>>> For each aggregate R(args ; scalarvals) now in X
>> aggregate R = count(*)
>> A1 := Aggregation(*, count, {}, Group(?x, BGP({ ?x :p ?v })))
>>> Replace R(...) with agg_1 in Q
>>
>> Q = SELECT ?x (1+?agg_1 as ?y) WHERE { ?x :p ?v } GROUP BY ?x
>> ## Did you mean Q?
>
> I think yes. The modified Q is then used later, where we no longer
> want to see aggregates, but instead ?aggi. The select expressions are
> later turned into extends withh ?aggi variables.
>
>> ** Replace R(...) with agg_1 in X
> In this case, one would assume that changes to X are propagated
> through to Q. I think replacing in Q is clearer.
>
>> ## but X never gets mentioned again.
>> ## Text seems to have lost an "extend" or assignment to E
> ? The normal select expressions are handled later in 18.2.4.4. The
> extends that we construct here are just to avoid errors because of
> invalid select expressions. For example SELECT ?x { ... } GROUP BY ?x
> should become something like SELECT (SAMPLE(?x) AS ?x) { ... } GROUP
> BY ?x, but that wouldn't be valid. Maybe, now that we have
> unaggregated variables that do not occur within an assigment in a
> separate for loop, we could actually get w=away with rewriting Q again
> here, e.g., into SELECT (?aggi AS ?x) ...
>
>> ## This does not do anything with (?y, ?agg_1)
>> ** Add E := E append (?y, X)
> This happens in 18.2.4.4.

This is another way to dealing with lost ?agg_i to including the name in 
the aggregation().  It wasn't clear in my email.

We add what is effectively

   (Ai AS ?agg_i)

and doing it here means it's before 18.2.4.4.processing, effectively, 
making it at the start of the SELECT clause, and hence in-scope for all 
later expressions.

Putting the variable name in the aggregation() is an alternative approach.

Looking over this, I mildly prefer setting E to  (Ai AS ?agg_i) then to 
passing the variable name to Aggregation() because that's passing quoted 
variables around.

>> ## Otherwise the connection between A1 and ?agg_1 is lost.
>
> This is only clear from 18.5 Evaluation Semantics, where the aggi are
> assigned in the evaluation of AggregateJoin. Section "18.4.1 Aggregate
> Algebra" still misses definitions for Aggregation und AggregateJoin.

OK - I now see how the agg_i are recreated in the eval of AggregateJoin 
but it does not quite work.  The knowledge of variable choice can't be 
encoded in the "for i 1 to n".

A query with multiple grouped aggregates fails because the counter 
starts from 1 each time.  Several "agg_1" possible and it's not covered 
by "aggi is a temporary variable" because this evaluation definition is 
depending on the name format.  The decision back in translation got lost.

In the translation, we need a global (per query translation counter) in 
"step: Aggregation" not a per-group occurrence.  Associate the allocated 
variable with the aggregation.


Putting it all in one place:

1/ Allow aggregates in ORDER BY

2/ Skip if no grouping/aggregation

3/ Make the allocaiton of i for agg_i global

4/ Ensure that evaluation of AggregateJoin gets the right agg_i
    3a/ Add (?agg_i, Ai) to E
    3b/ Add ?agg_i into Aggragate() to carry through to the eval step.

5/ Defns of Aggregation, AggregateJoin and Group.

6/ Link (E is then used in 18.2.4.4)

7/ 18.2.4.4 -- setting of E

8/ Editorial: "Note that if eval(D(G), Ai) is an error, it is ignored."

and check the evals are right.


 Andy
Received on Monday, 28 November 2011 12:21:15 UTC