Re: Order of evaluation for aggregates from Steve Harris on 2011-11-15 (public-rdf-dawg@w3.org from October to December 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Tue, 15 Nov 2011 12:44:40 +0000
To: birte.glimm@uni-ulm.de
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>, Andy Seaborne <andy.seaborne@epimorphics.com>
Message-Id: <7007C215-4A6C-4776-A02D-58EAC11CBDF3@garlik.com>

Many thanks Birte.

OK, of these I suspect that changing the substitution to (SAMPLE(?x) AS ?x) will mean the least changes.

I wanted to convert the plain ?x projection to an aggregate so it was consistent with the rest of the projections, but expressing it explicitly would be equivalent I think.

I will have a run through the aggregation text and see if I can make that change with a relatively small change to the document.

Cheers,
   Steve

On 2011-11-10, at 10:59, Birte Glimm wrote:

> Steve, Andy, all,
> 
> I am trying to address AM-1, which is the question about evaluation
> order for queries with aggregates. There is indeed inconsistency in
> the spec regarding this and I believe we further miss a step in the
> aggregate evaluation.
> 
> We first translate the query pattern, which works ok I believe and
> results in some algebra object for the query pattern, say P. If the
> query has aggregates, the solutions obtained by evaluation P have to
> be grouped. The GROUP BY clause can itself contain expressions, e.g.,
> GROUP BY ((?x + ?y) AS ?z). Thus, we need an Extend algebra object
> before we can group, which we do not consider in the current algebra
> translation.
> 
> Once we have extended the solutions with the group by clause
> expressions, we can groupand then form the aggregate. One problem here
> is that when we select a grouped variable, this variable is currently
> replaced by a SAMPLE aggregate without assigning the aggregated value
> back to the variable, so we get ?agg_i variables in the results. E.g.,
> from
> SELECT ?x (AVG(?y) AS ?z) { ... } GROUP BY ?x
> we first get
> P=Group((?x), bgp(...))
> and the rewritten query
> SELECT SAMPLE(?x) (AVG(?y) AS ?z) { ... } GROUP BY ?x
> we then get
> P=AggregateJoin(
>  Aggregation((?x), SAMPLE, {}, P),
>  Aggregation((?y), AVG {}, P)
> )
> and the rewritten query
> SELECT ?agg_1 (?agg_2 AS ?z) { ... } GROUP BY ?x
> We then get
> Extend(P, ?z, ?agg_2)
> ?agg_1, however, remains as there is no AS ... in the SLECT clause.
> 
> This is not nice and might even lead to problems, when I use ?x in the
> HAVING clause as ?x is no longer mapped by any solution mapping.
> 
> I see two possibilities to fix that:
> 1) Replace a non-aggregated variable ?x with (SAMPLE(?x) AS ?x)
> 2) There is not really a need to aggregate these variables at all, as
> they are unique for each group (since only grouped variables can be
> selected). thus, AggregateJoin(...) could be extended to construct the
> solution mappings by including each grouped variable with its value
> (as in the key for the aggregate) plues the aggregated values.
> 
> 
> Thus, I believe, we have to have the following order in the algebra translation:
> 1) query pattern translation as normal
> 2) Translate GROUP expressions -> Extend(...)
> 3) Group
> 4) Aggregate (fixed to properly handle non-aggregated variables)
> 5) Extend
> 6) Filter (from HAVING)
> Filter can only work after we have properly assigned the variables in step 5.
> 
> In the current algebra transformation we omit 2), and have 5) and 6) swapped.
> 
> Birte
> 
> 
> -- 
> Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
> Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
> University of Ulm                         Fax:   +49 731 50 24188
> D-89069 Ulm                               birte.glimm@uni-ulm.de
> Germany
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Tuesday, 15 November 2011 12:45:19 UTC