- From: Birte Glimm <birte.glimm@uni-ulm.de>
- Date: Thu, 10 Nov 2011 11:59:37 +0100
- To: SPARQL Working Group <public-rdf-dawg@w3.org>, Andy Seaborne <andy.seaborne@epimorphics.com>, Steve Harris <steve.harris@garlik.com>
Steve, Andy, all, I am trying to address AM-1, which is the question about evaluation order for queries with aggregates. There is indeed inconsistency in the spec regarding this and I believe we further miss a step in the aggregate evaluation. We first translate the query pattern, which works ok I believe and results in some algebra object for the query pattern, say P. If the query has aggregates, the solutions obtained by evaluation P have to be grouped. The GROUP BY clause can itself contain expressions, e.g., GROUP BY ((?x + ?y) AS ?z). Thus, we need an Extend algebra object before we can group, which we do not consider in the current algebra translation. Once we have extended the solutions with the group by clause expressions, we can groupand then form the aggregate. One problem here is that when we select a grouped variable, this variable is currently replaced by a SAMPLE aggregate without assigning the aggregated value back to the variable, so we get ?agg_i variables in the results. E.g., from SELECT ?x (AVG(?y) AS ?z) { ... } GROUP BY ?x we first get P=Group((?x), bgp(...)) and the rewritten query SELECT SAMPLE(?x) (AVG(?y) AS ?z) { ... } GROUP BY ?x we then get P=AggregateJoin( Aggregation((?x), SAMPLE, {}, P), Aggregation((?y), AVG {}, P) ) and the rewritten query SELECT ?agg_1 (?agg_2 AS ?z) { ... } GROUP BY ?x We then get Extend(P, ?z, ?agg_2) ?agg_1, however, remains as there is no AS ... in the SLECT clause. This is not nice and might even lead to problems, when I use ?x in the HAVING clause as ?x is no longer mapped by any solution mapping. I see two possibilities to fix that: 1) Replace a non-aggregated variable ?x with (SAMPLE(?x) AS ?x) 2) There is not really a need to aggregate these variables at all, as they are unique for each group (since only grouped variables can be selected). thus, AggregateJoin(...) could be extended to construct the solution mappings by including each grouped variable with its value (as in the key for the aggregate) plues the aggregated values. Thus, I believe, we have to have the following order in the algebra translation: 1) query pattern translation as normal 2) Translate GROUP expressions -> Extend(...) 3) Group 4) Aggregate (fixed to properly handle non-aggregated variables) 5) Extend 6) Filter (from HAVING) Filter can only work after we have properly assigned the variables in step 5. In the current algebra transformation we omit 2), and have 5) and 6) swapped. Birte -- Jun. Prof. Dr. Birte Glimm Tel.: +49 731 50 24125 Inst. of Artificial Intelligence Secr: +49 731 50 24258 University of Ulm Fax: +49 731 50 24188 D-89069 Ulm birte.glimm@uni-ulm.de Germany
Received on Thursday, 10 November 2011 11:00:08 UTC