W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > October to December 2011

Re: Aggregates

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 09 Dec 2011 10:53:21 +0000
Message-ID: <4EE1E8A1.8000003@epimorphics.com>
To: Steve Harris <steve.harris@garlik.com>
CC: public-rdf-dawg@w3.org


On 09/12/11 10:26, Steve Harris wrote:
> On 2011-12-08, at 09:02, Andy Seaborne wrote:
>
>>
>>
>> On 07/12/11 13:57, Steve Harris wrote:
>>> On 2011-12-07, at 13:00, Andy Seaborne wrote:
>>>
>>>>
>>>>
>>>> On 06/12/11 22:40, Steve Harris wrote:
>>>>> Hi all,
>>>>>
>>>>> I've now got the aggregates in a state where I think all the information is carried through from one end of the query to the other… but I've thought that before :)
>>>>>
>>>>> I also think ORDER BY is covered.
>>>>>
>>>>> Here's a sketch of what I think should be happening:
>>>>>
>>>>> Data
>>>>>
>>>>> <a>    <p>    1 .
>>>>> <a>    <p>    2 .
>>>>> <b>    <p>    3 .
>>>>>
>>>>> Query
>>>>>
>>>>> SELECT (MAX(?o) AS ?max) (MIN(?o) AS ?min)
>>>>> WHERE { ?s ?p ?o }
>>>>> GROUP BY ?s
>>>>> ORDER BY AVG(?o)
>>>>>
>>>>>
>>>>> Ω = Sol  ?s   ?p   ?o
>>>>>      μ1<a>     <p>     1
>>>>>      μ2<a>     <p>     2
>>>>>      μ3<b>     <p>     3
>>>>>
>>>>> G = Group((?s), Ω)
>>>>>    = { ((<a>), { μ1, μ2 }), ((<b>), { μ3 }) }
>>>>>
>>>>> Q = SELECT agg1 agg2
>>>>>      WHERE { ?s ?p ?o }
>>>>>      GROUP BY ?s
>>>>>      ORDER BY agg3
>>>>>
>>>>> E = { (?max, agg1), (?min, agg2) }
>>>>>
>>>>> A1 = Aggregation((?o), Max, {}, G)
>>>>> A2 = Aggregation((?o), Min, {}, G)
>>>>> A3 = Aggregation((?o), Avg, {}, G)
>>>>>
>>>>> J = AggregateJoin(A) =
>>>>>    { { (agg1, 2), (agg2, 1), (agg3, 1.5) }
>>>>>      { (agg1, 3), (agg2, 3), (agg3, 3) } }
>>>>
>>>>
>>>> This is the evaluation of AggregateJoin at execution time.
>>>>
>>>> I don't understand this step: how does it know the variables are agg1, agg2, and agg3? There could be other agg_i from other query levels. And why this order not agg3, agg2, agg1?
>>>
>>>  From the A, A has members 1, 2, and 3 in this case. A1 pairs with agg1 for e.g.
>>>
>>> If it were a lower query level it might have members 4, 5, and 6 for e.g.
>>
>> Not quite: i is reset on every SELECT processed
>> """
>>   # Note, i is global for the query, defaults to 1
>>   Let i := 1
>> """
>>
>> The comment might have that intent, but, to me, "Let" introduces a variable each time.
>
> Well, I can remove the word Let if that helps.

Rewording that would be a great help. (I'm not sure what you intend by 
"default" ) Why not

global i := 1   # Initially 1 for each query processed

>> It is workable as a definition but rather unclear to me.  The fact it works relies on scoping features of variables so that the use of agg_1 twice does not fall apart.
>
> But, that's true of all the pseudocode in the document.

Not at all.  Other forms generated pass the variables around e.g. SELECT 
expressions generate "Extend(?, var, expr)" forms so the variable is there.

>> Also, the variable names are regenerated.  How does AggregateJoin know the variable is called "agg1" not "__gen1" because the query really does use ?agg1 in teh user written part?
>
> Well, it's explicit in the text - I don't really understand how this is an issue - users have no way to create a variable called agg<sub>1</sub>, so their can't be any conflict.

This worries me.  Why can't a user accidentally write the name of an 
aggregate?

Either (1) the aggregate code generates a variable named in a way that 
query syntax can't create or (2) the agg variable needs to be named 
based on the rest of the query.

If it's (2), then how does the eval of AggregateJoin know that choice?

>> I think it would have been easier to do it all in the translation and not have AggregateJoin which is really just a form of "extend" assigning Ai to agg_i.
>
> It's more complex than that - it has to collapse the groups into a solution multiset in order to fit with the rest of the algebra.

Could you please point to the text for that because as written currently 
its the Aggregate function, each one, not AggregateJoin, that does the 
collapse

-------
Definition: AggregateJoin

AggregateJoin, a function which takes a set of aggregations, and returns 
a multiset of solutions.
-------
Definition: Evaluation of Aggregation
eval(D(G), Aggregation(exprlist, func, scalarvals, P)) = { (dom(g), F) | 
g in eval(D(G), P) }

P as produced by the Group function
-------


and then:
-------
Definition: Evaluation of Aggregation
eval(D(G), AggregateJoin(A)) = { (agg1, v1), ..., (aggn, vn) | vi such 
that ( k, vi ) in eval(D(G), Ai)
for some k and each 1 <= i <= n }
-------

so AggregateJoin works on the value produced by eval(D(G), Ai)

	Andy
Received on Friday, 9 December 2011 10:53:55 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:47 GMT