Re: Order of evaluation for aggregates

On 22 November 2011 21:37, Steve Harris <steve.harris@garlik.com> wrote:
> On 22 Nov 2011, at 18:06, Birte Glimm wrote:
>
>> On 22 November 2011 13:50, Steve Harris <steve.harris@garlik.com> wrote:
[snip]

I just noticed the fix for unaggregated vars in expressions got lost
in my summary code, so I fix that below:

>> ------------------------
>> Let A := the empty sequence
>> Let Q := the query level being evaluated
>> Let P := the algebra translation of the GroupGraphPattern of the query level
>> Let E := [], a list of pairs of the form (variable, expression)
>>
>> If Q contains GROUP BY exprlist
>>    Let G := Group(exprlist, P)
>> Else
>>    Let G := Group((1), P)
>>    End
>>
>> Let i := 1
>>
>> For each expression E in SELECT and each HAVING(E) in Q
         If E contains an unaggregated variable V
            Replace V with Sample(V)
>>       End
>>   For each aggregate X(args ; scalarvals) now in E
>>       # note scalarvals may be omitted, then it's equivalent to the empty set
>>       Ai := Aggregation(args, X, scalarvals, G)
>>       Replace X(...) with aggi in Q
>>       i := i + 1
>>       End
>>   End
>>
>> For each variable V appearing outside of an aggregate
>>    Ai := Aggregation(V, Sample, {}, G)
>>    E := E append (V, aggi)
>>    i := i + 1
>>    End
>>
>> A := Ai, ..., Ai-1
>> P := AggregateJoin(A)
>>
>> For each HAVING(E) in Q
>>    P := Filter(E, P)
>>    End
>>
>> Note: E is then used in 18.2.4.4 for the processing of select
>> expressions.
--------------------
> OK, that seems pretty reasonable, but "Expression E" etc. probably needs some other symbol to reduce confusion.


Agreed. Maybe exp? X already used for aggregates.

> I think there's a problem with SELECT though:
>
> SELECT ?x
> WHERE { ?x a <Foo> }
> GROUP BY ?x
>
> "If E does not contain an aggregate" will be true, and that will become SELECT Sample(?x), but the extend E will not be modified.

My idea is to handle those variables in the latter for loop and I
hoped that simple variables are not expressions, but I'm probably
wrong about this. So we could strengthen the condition of the first
loop to, e.g.,

For each (Exp AS Var) in SELECT and each HAVING(Exp) in Q
    If Exp contains an unaggregated variable V
       Replace V with Sample(V)
       End
    For each aggregate X(args ; scalarvals) now in Exp
       # note scalarvals may be omitted, then it's equivalent to the empty set
       Ai := Aggregation(args, X, scalarvals, G)
       Replace X(...) with aggi in Q
       i := i + 1
       End
   End

For each variable V appearing outside of an aggregate in SELECT
   Ai := Aggregation(V, Sample, {}, G)
   E := E append (V, aggi)
   i := i + 1
   End

I also added *in SELECT* to the latter for loop, to clarify that the
loop only applies to selected unaggregated variables (and not
arbitrary variables in Q). Is it clear though that this should not
apply to the assigment variables such as ?x in (?y AS ?x)? This loop
is only for plain, unaggregated, selected variables as in SELECT ?x
...
Maybe
For each selected unaggregated variable in SELECT?
or
For each selected item selItem in SELECT
   If SelItem is a variable
      Ai := Aggregation(SelItem, Sample, {}, G)
      E := E append (SelItem, aggi)
      i := i + 1
      End
   End

> "If E does not contain an aggregate, or group variable" may cover it?

Even if the variable is grouped, we have to sample it and introduce
Extend() since grouping by itself does not lead to any solution
mappings being constructed.

Birte
[snip]

Received on Wednesday, 23 November 2011 10:36:32 UTC