Re: Order of evaluation for aggregates from Birte Glimm on 2011-11-28 (public-rdf-dawg@w3.org from October to December 2011)

From: Birte Glimm <birte.glimm@uni-ulm.de>
Date: Mon, 28 Nov 2011 11:28:40 +0100
To: Andy Seaborne <andy.seaborne@epimorphics.com>, Steve Harris <steve.harris@garlik.com>
Cc: sparql Working Group <public-rdf-dawg@w3.org>
Message-ID: <CABt65OcAN_PZT4ezXMWzY=4meFvzaQLrtnvDEhRj+1mP_m_q+w@mail.gmail.com>
On 26 November 2011 23:59, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
> Steve,
>
> I'm working through the definitions as they are in rq25 at the moment (Nov
> 26).
>
> I see no problem extending to ORDER BY : it works on ?agg_i and they are
> in-scope.

Agreed.

> ## are comments
> ** are suggestions
>
> Q = SELECT ?x (1+count(*) as ?y) WHERE { ?x :p ?v } GROUP BY ?x
> P = BGP({ ?x :p ?v })
>
>> If Q contains GROUP BY exprlist
>>    Let G := Group(exprlist, P)
>> Else
>>    Let G := Group((1), P)
>>    End
>
> ## What about the case of no GROUP BY and no aggregate?
>   This catchall always groups a query
> ** ---------
> If Q contains GROUP BY exprlist
>   Let G := Group(exprlist, P)
> Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
>   Let G := Group((1), P)
> Else
>   skip the rest of the aggregate step
>   End
> ** ---------

Good point.

> G := Group(?x, BGP({ ?x :p ?v }))
> i:=1
>
>> For each (X AS Var) in SELECT and each HAVING(X) in Q
> so
>  X=1+count(*) Var = ?y
>
>> If X contains an unaggregated variable V
>
> ** s/Var/V/ in the For loop above.

No. We arelooking at (X AS Var), but we now want to know whether X
contains a variable that is not aggregated. Obviously, variables in X
are different from Var. This is meant to handle things like (1+?x AS
?y), where ?x is grouped, but not aggregated. This should become
(1+SAMPLE(?x) AS ?y).

>> For each aggregate R(args ; scalarvals) now in X
> aggregate R = count(*)
> A1 := Aggregation(*, count, {}, Group(?x, BGP({ ?x :p ?v })))
>> Replace R(...) with agg_1 in Q
>
> Q = SELECT ?x (1+?agg_1 as ?y) WHERE { ?x :p ?v } GROUP BY ?x
> ## Did you mean Q?

I think yes. The modified Q is then used later, where we no longer
want to see aggregates, but instead ?aggi. The select expressions are
later turned into extends withh ?aggi variables.

> ** Replace R(...) with agg_1 in X
In this case, one would assume that changes to X are propagated
through to Q. I think replacing in Q is clearer.

> ## but X never gets mentioned again.
> ## Text seems to have lost an "extend" or assignment to E
? The normal select expressions are handled later in 18.2.4.4. The
extends that we construct here are just to avoid errors because of
invalid select expressions. For example SELECT ?x { ... } GROUP BY ?x
should become something like SELECT (SAMPLE(?x) AS ?x) { ... } GROUP
BY ?x, but that wouldn't be valid. Maybe, now that we have
unaggregated variables that do not occur within an assigment in a
separate for loop, we could actually get w=away with rewriting Q again
here, e.g., into SELECT (?aggi AS ?x) ...

> ## This does not do anything with (?y, ?agg_1)
> ** Add E := E append (?y, X)
This happens in 18.2.4.4.

> ## Otherwise the connection between A1 and ?agg_1 is lost.

This is only clear from 18.5 Evaluation Semantics, where the aggi are
assigned in the evaluation of AggregateJoin. Section "18.4.1 Aggregate
Algebra" still misses definitions for Aggregation und AggregateJoin.
In any case, I agree that this is making understanding quite
difficult. I also prefer if aggi were introduced explicitly to each
Aggregate algebra object. This would also make algebra optimisations
easier. You should, ideally, be free to shuffle Agrregate objects
around within an AggregateJoin object, without mesing everything up.

> E := (?y, 1 + ?agg_1)
see 18.2.4.4

> i = 2
>
>> For each variable V appearing outside of an aggregate
> V = ?x
> A2 := Aggregation(?x, Sample, {}, G)
> E := (?y, 1 + ?agg_1) (?x, ?agg_2)
> i := 3
>
> P := AggregateJoin(A1, A2)
>
> ## Note -- we can do ORDER BY as well because ?agg varaibles never go out of
> scope and so continue until projection happens.  ORDER is before
> projection.

Agreed.

> "E is then used in 18.2.4.4"
> ** Link needed.
>
> ----
> At this point:
> P = AggregateJoin(A1, A2)
> A1 = Aggregation(*, count, {}, Group(?x, BGP({ ?x :p ?v })))
> A2 = Aggregation(?x, Sample, {}, G)
> E = (?y, 1 + ?agg_1) (?x, ?agg_2)  # My correction
>
> ## Problem: how does A1 get associated with ?agg_1

see evaluation semantics: AggregateJoin, although I am very much in
favour of a more explicit assignment.

> ## The connection of A1 to ?agg_1 didn't get recorded anywhere.
Just from Ai you know aggi, so A1 means agg1.
> ** add ?agg_i to "Aggregation"
>
> e.g. Aggregation(?agg_1, *, count, {}, Group(?x, BGP({ ?x :p ?v })))
+1

> Example:
> Does not mention agg_i
> There is no E; I think it should be
>   E = [(?sum, ?agg_1) (?count, ?agg_2)]
E in this case is only constructed later :-(

> ------------------------------------------------------------------------
>
> 18.2.4.4 SELECT Expressions
>
> ** Needs noting it can be set earlier.
> Let E := [], a list of pairs of the form (variable, expression)

Yes, this needs to be adapted to be initialised with E from the Aggregates step.

> ------------------------------------------------------------------------
>
> I'll try look at the evaluation next and also try to come up with
> definitions for Group, Aggregation, AggregateJoin to go with the eval defns.
> Also, I think "Aggregation" needs to know the name of the ?agg_i variable it
> is going to set (see above).
>
> One comment for now:
>
> "Note that if eval(D(G), Ai) is an error, it is ignored."
>
> Does "it" mean the error?
> And "ignore" mean the (agg_i, v_i) pair not included in the AggregateJoin?
>
> I think it needs to say that explicitly.
>
>
> Sorry about the layout - not sure how best to write comments and suggestions
> while showing the working.

i think it worked.

Birte

>        Andy
>
>



-- 
Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
University of Ulm                         Fax:   +49 731 50 24188
D-89069 Ulm                               birte.glimm@uni-ulm.de
Germany
Received on Monday, 28 November 2011 10:29:21 UTC