Re: another aggregates test case...

On 09/06/2010 1:43 PM, Steve Harris wrote:
> On 2010-06-09, at 10:23, Andy Seaborne wrote:
>>
>> On 09/06/2010 10:08 AM, Steve Harris wrote:
>>>> which leads me to a fairly natural interpretation of
>>>>>
>>>>>   SELECT ?s ?p
>>>>>   {
>>>>>      ?s ?p ?p
>>>>>   } GROUP BY ?s ?p
>>>>>
>>>>>   as "null aggregation"
>>> I don't understand the term "null aggregation".
>>
>> It is a term earlier in the thread to capture the idea that SELECT/GROUP with no aggregators mentioned fitted into the current framework with an implicit aggregator that did nothing.
>
> This is captured in the current draft with:
>
> "Definition: Group
>
> Group evaluates a list of expressions against a solution sequence, producing a set of partial functions from keys to solution sequences.
>
> The behaviour of Group is different when ExprList is empty.
>
> Group((), Ω) = { 1 ->  Ω }

That covers the case of

SELECT (count(*) AS ?C)
{ ?s ?p ?o }

It's the case where there is no count or other aggregator:

SELECT ?s
{ ?s ?p ?o }
GROUP BY ?s

that trigger the idea of "null aggregation".  Here there is no count or 
other aggregation.

AggregateJoin(A) =
    { { aggi → range(Ai) } | dom(Ai) = k, k in set-union(dom(A)) }
      = {}

>
> Group(ExprList, Ω) = { ListEval(ExprList, μ) ->  { μ' | μ' in Ω, ListEval(ExprList, μ) = ListEval(ExprList, μ') } | μ in Ω }"

We have the group operation producing the functions from key to multiset 
(cardinality = cardinality of  μ' in Ω). I'm looking for something that 
produces the query solution, given

Maybe we need a algebra operation (group)

   group ( things to group by,
           aggregators to apply,   # AggregateJoin
           pattern to work on )
    -> set of { agg expression -> value }

I chose the name "group" to make it the overall operation -- we already 
have Group(ExprList, Ω) which is different so maybe rename that as the 
partition function as per the language Chimie used and in draft of 
WD-sparql11-query-20091022/. That draft  which also has a key() function 
to generate the keys needed to put into the query solution.

It might be easier to make AggregateJoin assign the aggregation values 
to fresh variables, then assign to thier AS names for the case:

AggregateJoin(A) =
   { (?fresh,  aggi → range(Ai))( } ...
   # A pair (variable, value)

SELECT ?s (sum(?o) AS ?sum)
{ ?s ?p ?o }
HAVING (count(*) > 10)

The project/select expression is then decoupled from the 
group/aggrgeation process (and has a filter in the way in the case anyway).

Using the AggregateJoin aggregation as the expression name might also 
work but it needs general expressions changed to write HAVING (count(*) 
 > 10) as there the reference to the result of count(*) isn't an RDF 
term or variables as expressions currently work with.  Variables are a 
convenient way to deal with the value of an expression.

Given your formalization of how aggregation happens, it looks like we 
have all the bit-and-pieces needed.

 Andy

Received on Wednesday, 9 June 2010 14:39:38 UTC