W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > October to December 2011

Re: Order of evaluation for aggregates

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 25 Nov 2011 13:43:31 +0000
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <36DA5B3D-8C5B-421E-9B37-E6E80E96D913@garlik.com>
To: birte.glimm@uni-ulm.de
Great, thanks.

I think http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#convertGroupAggSelectExpressions now contains all these changes. It makes sense to me.

I used R for aggregate and X for the expression in the end.

Cheers,
   Steve

On 2011-11-23, at 10:36, Birte Glimm wrote:

> On 22 November 2011 21:37, Steve Harris <steve.harris@garlik.com> wrote:
>> On 22 Nov 2011, at 18:06, Birte Glimm wrote:
>> 
>>> On 22 November 2011 13:50, Steve Harris <steve.harris@garlik.com> wrote:
> [snip]
> 
> I just noticed the fix for unaggregated vars in expressions got lost
> in my summary code, so I fix that below:
> 
>>> ------------------------
>>> Let A := the empty sequence
>>> Let Q := the query level being evaluated
>>> Let P := the algebra translation of the GroupGraphPattern of the query level
>>> Let E := [], a list of pairs of the form (variable, expression)
>>> 
>>> If Q contains GROUP BY exprlist
>>>    Let G := Group(exprlist, P)
>>> Else
>>>    Let G := Group((1), P)
>>>    End
>>> 
>>> Let i := 1
>>> 
>>> For each expression E in SELECT and each HAVING(E) in Q
>         If E contains an unaggregated variable V
>            Replace V with Sample(V)
>>>       End
>>>   For each aggregate X(args ; scalarvals) now in E
>>>       # note scalarvals may be omitted, then it's equivalent to the empty set
>>>       Ai := Aggregation(args, X, scalarvals, G)
>>>       Replace X(...) with aggi in Q
>>>       i := i + 1
>>>       End
>>>   End
>>> 
>>> For each variable V appearing outside of an aggregate
>>>    Ai := Aggregation(V, Sample, {}, G)
>>>    E := E append (V, aggi)
>>>    i := i + 1
>>>    End
>>> 
>>> A := Ai, ..., Ai-1
>>> P := AggregateJoin(A)
>>> 
>>> For each HAVING(E) in Q
>>>    P := Filter(E, P)
>>>    End
>>> 
>>> Note: E is then used in 18.2.4.4 for the processing of select
>>> expressions.
> --------------------
>> OK, that seems pretty reasonable, but "Expression E" etc. probably needs some other symbol to reduce confusion.
> 
> 
> Agreed. Maybe exp? X already used for aggregates.
> 
>> I think there's a problem with SELECT though:
>> 
>> SELECT ?x
>> WHERE { ?x a <Foo> }
>> GROUP BY ?x
>> 
>> "If E does not contain an aggregate" will be true, and that will become SELECT Sample(?x), but the extend E will not be modified.
> 
> My idea is to handle those variables in the latter for loop and I
> hoped that simple variables are not expressions, but I'm probably
> wrong about this. So we could strengthen the condition of the first
> loop to, e.g.,
> 
> For each (Exp AS Var) in SELECT and each HAVING(Exp) in Q
>    If Exp contains an unaggregated variable V
>       Replace V with Sample(V)
>       End
>    For each aggregate X(args ; scalarvals) now in Exp
>       # note scalarvals may be omitted, then it's equivalent to the empty set
>       Ai := Aggregation(args, X, scalarvals, G)
>       Replace X(...) with aggi in Q
>       i := i + 1
>       End
>   End
> 
> For each variable V appearing outside of an aggregate in SELECT
>   Ai := Aggregation(V, Sample, {}, G)
>   E := E append (V, aggi)
>   i := i + 1
>   End
> 
> I also added *in SELECT* to the latter for loop, to clarify that the
> loop only applies to selected unaggregated variables (and not
> arbitrary variables in Q). Is it clear though that this should not
> apply to the assigment variables such as ?x in (?y AS ?x)? This loop
> is only for plain, unaggregated, selected variables as in SELECT ?x
> ...
> Maybe
> For each selected unaggregated variable in SELECT?
> or
> For each selected item selItem in SELECT
>   If SelItem is a variable
>      Ai := Aggregation(SelItem, Sample, {}, G)
>      E := E append (SelItem, aggi)
>      i := i + 1
>      End
>   End
> 
>> "If E does not contain an aggregate, or group variable" may cover it?
> 
> Even if the variable is grouped, we have to sample it and introduce
> Extend() since grouping by itself does not lead to any solution
> mappings being constructed.
> 
> Birte
> [snip]
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Friday, 25 November 2011 13:44:04 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:47 GMT