W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > October to December 2011

Re: Aggregates

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 9 Dec 2011 10:26:57 +0000
Cc: public-rdf-dawg@w3.org
Message-Id: <1E7DFC9E-60D9-4C3C-98BB-9F80804FC4E5@garlik.com>
To: Andy Seaborne <andy.seaborne@epimorphics.com>
On 2011-12-08, at 09:02, Andy Seaborne wrote:

> 
> 
> On 07/12/11 13:57, Steve Harris wrote:
>> On 2011-12-07, at 13:00, Andy Seaborne wrote:
>> 
>>> 
>>> 
>>> On 06/12/11 22:40, Steve Harris wrote:
>>>> Hi all,
>>>> 
>>>> I've now got the aggregates in a state where I think all the information is carried through from one end of the query to the other… but I've thought that before :)
>>>> 
>>>> I also think ORDER BY is covered.
>>>> 
>>>> Here's a sketch of what I think should be happening:
>>>> 
>>>> Data
>>>> 
>>>> <a>   <p>   1 .
>>>> <a>   <p>   2 .
>>>> <b>   <p>   3 .
>>>> 
>>>> Query
>>>> 
>>>> SELECT (MAX(?o) AS ?max) (MIN(?o) AS ?min)
>>>> WHERE { ?s ?p ?o }
>>>> GROUP BY ?s
>>>> ORDER BY AVG(?o)
>>>> 
>>>> 
>>>> Ω = Sol  ?s   ?p   ?o
>>>>     μ1<a>    <p>    1
>>>>     μ2<a>    <p>    2
>>>>     μ3<b>    <p>    3
>>>> 
>>>> G = Group((?s), Ω)
>>>>   = { ((<a>), { μ1, μ2 }), ((<b>), { μ3 }) }
>>>> 
>>>> Q = SELECT agg1 agg2
>>>>     WHERE { ?s ?p ?o }
>>>>     GROUP BY ?s
>>>>     ORDER BY agg3
>>>> 
>>>> E = { (?max, agg1), (?min, agg2) }
>>>> 
>>>> A1 = Aggregation((?o), Max, {}, G)
>>>> A2 = Aggregation((?o), Min, {}, G)
>>>> A3 = Aggregation((?o), Avg, {}, G)
>>>> 
>>>> J = AggregateJoin(A) =
>>>>   { { (agg1, 2), (agg2, 1), (agg3, 1.5) }
>>>>     { (agg1, 3), (agg2, 3), (agg3, 3) } }
>>> 
>>> 
>>> This is the evaluation of AggregateJoin at execution time.
>>> 
>>> I don't understand this step: how does it know the variables are agg1, agg2, and agg3? There could be other agg_i from other query levels. And why this order not agg3, agg2, agg1?
>> 
>> From the A, A has members 1, 2, and 3 in this case. A1 pairs with agg1 for e.g.
>> 
>> If it were a lower query level it might have members 4, 5, and 6 for e.g.
> 
> Not quite: i is reset on every SELECT processed
> """
>  # Note, i is global for the query, defaults to 1
>  Let i := 1
> """
> 
> The comment might have that intent, but, to me, "Let" introduces a variable each time.

Well, I can remove the word Let if that helps.

> It is workable as a definition but rather unclear to me.  The fact it works relies on scoping features of variables so that the use of agg_1 twice does not fall apart.

But, that's true of all the pseudocode in the document.

> Also, the variable names are regenerated.  How does AggregateJoin know the variable is called "agg1" not "__gen1" because the query really does use ?agg1 in teh user written part?

Well, it's explicit in the text - I don't really understand how this is an issue - users have no way to create a variable called agg<sub>1</sub>, so their can't be any conflict.

> I think it would have been easier to do it all in the translation and not have AggregateJoin which is really just a form of "extend" assigning Ai to agg_i.

It's more complex than that - it has to collapse the groups into a solution multiset in order to fit with the rest of the algebra.

> Just rewriting to
> 
> extend( ?agg1 := Aggregation((?o), Max, {}, G),
>        ?agg2 := Aggregation((?o), Min, {}, G),
>        ?agg3 := Aggregation((?o), Avg, {}, G) )
> 
> would keep the aggregation and the variable together.

Well, everywhere else in the document uses the form Extend(Extend(…), var, expr), and I suspect that written in that way it's even less clear than what we have now.

You can't actually do the above anyway because it doesn't handle the bindings for each group, and what you'd actually need would be a real eyeful.

What I tried to write is sort-of equivalent to:

{ ( ?agg1 -> Aggregation((?o), Max, {}, G,
    ?agg2 -> Aggregation((?o), Min, {}, G,
    ?agg3 -> Aggregation((?o), Avg, {}, G ) }

Explicitly constructing a solution multiset, but it's complicated because it requires unrolling the groups, and expressing in generic terms.

> If the reviewers are comfortable with the form in the doc, then I can live with it but I think it works by relying on human reading and associating text.

I think it's explicit in the consistent use of _i across A and agg.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Friday, 9 December 2011 10:27:47 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:47 GMT