Re: another aggregates test case... from Steve Harris on 2010-06-13 (public-rdf-dawg@w3.org from April to June 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Sun, 13 Jun 2010 10:41:33 +0100
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, Axel Polleres <axel.polleres@deri.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <599FA898-25B2-40A3-BEC7-2191BD14A16F@garlik.com>

On 2010-06-09, at 15:31, Andy Seaborne wrote:
> On 09/06/2010 1:43 PM, Steve Harris wrote:
>> On 2010-06-09, at 10:23, Andy Seaborne wrote:
>>> 
>>> On 09/06/2010 10:08 AM, Steve Harris wrote:
>>>>> which leads me to a fairly natural interpretation of
>>>>>> 
>>>>>>  SELECT ?s ?p
>>>>>>  {
>>>>>>     ?s ?p ?p
>>>>>>  } GROUP BY ?s ?p
>>>>>> 
>>>>>>  as "null aggregation"
>>>> I don't understand the term "null aggregation".
>>> 
>>> It is a term earlier in the thread to capture the idea that SELECT/GROUP with no aggregators mentioned fitted into the current framework with an implicit aggregator that did nothing.
>> 
>> This is captured in the current draft with:
>> 
>> "Definition: Group
>> 
>> Group evaluates a list of expressions against a solution sequence, producing a set of partial functions from keys to solution sequences.
>> 
>> The behaviour of Group is different when ExprList is empty.
>> 
>> Group((), Ω) = { 1 ->  Ω }
> 
> That covers the case of
> 
> SELECT (count(*) AS ?C)
> { ?s ?p ?o }
> 
> It's the case where there is no count or other aggregator:
> 
> SELECT ?s
> { ?s ?p ?o }
> GROUP BY ?s
> 
> that trigger the idea of "null aggregation".  Here there is no count or other aggregation.
> 
> AggregateJoin(A) =
>   { { aggi → range(Ai) } | dom(Ai) = k, k in set-union(dom(A)) }
>     = {}

I see, yes.

A simpler approach might be to say that the projection of ?s in this case is implicitly an aggregation, as it is in SQL. Then there's no need for the scalar handling fudge I have now.

- Steve

>> Group(ExprList, Ω) = { ListEval(ExprList, μ) ->  { μ' | μ' in Ω, ListEval(ExprList, μ) = ListEval(ExprList, μ') } | μ in Ω }"
> 
> We have the group operation producing the functions from key to multiset (cardinality = cardinality of  μ' in Ω). I'm looking for something that produces the query solution, given
> 
> Maybe we need a algebra operation (group)
> 
>  group ( things to group by,
>          aggregators to apply,   # AggregateJoin
>          pattern to work on )
>   -> set of { agg expression -> value }
> 
> I chose the name "group" to make it the overall operation -- we already have Group(ExprList, Ω) which is different so maybe rename that as the partition function as per the language Chimie used and in draft of WD-sparql11-query-20091022/. That draft  which also has a key() function to generate the keys needed to put into the query solution.
> 
> It might be easier to make AggregateJoin assign the aggregation values to fresh variables, then assign to thier AS names for the case:
> 
> AggregateJoin(A) =
>  { (?fresh,  aggi → range(Ai))( } ...
>  # A pair (variable, value)
> 
> SELECT ?s (sum(?o) AS ?sum)
> { ?s ?p ?o }
> HAVING (count(*) > 10)
> 
> The project/select expression is then decoupled from the group/aggrgeation process (and has a filter in the way in the case anyway).
> 
> Using the AggregateJoin aggregation as the expression name might also work but it needs general expressions changed to write HAVING (count(*) > 10) as there the reference to the result of count(*) isn't an RDF term or variables as expressions currently work with.  Variables are a convenient way to deal with the value of an expression.
> 
> Given your formalization of how aggregation happens, it looks like we have all the bit-and-pieces needed.
> 
>  Andy
> 

-- 
Steve Harris, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Sunday, 13 June 2010 09:42:09 UTC