- From: Steve Harris <steve.harris@garlik.com>
- Date: Tue, 15 Nov 2011 12:44:40 +0000
- To: birte.glimm@uni-ulm.de
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>, Andy Seaborne <andy.seaborne@epimorphics.com>
Many thanks Birte. OK, of these I suspect that changing the substitution to (SAMPLE(?x) AS ?x) will mean the least changes. I wanted to convert the plain ?x projection to an aggregate so it was consistent with the rest of the projections, but expressing it explicitly would be equivalent I think. I will have a run through the aggregation text and see if I can make that change with a relatively small change to the document. Cheers, Steve On 2011-11-10, at 10:59, Birte Glimm wrote: > Steve, Andy, all, > > I am trying to address AM-1, which is the question about evaluation > order for queries with aggregates. There is indeed inconsistency in > the spec regarding this and I believe we further miss a step in the > aggregate evaluation. > > We first translate the query pattern, which works ok I believe and > results in some algebra object for the query pattern, say P. If the > query has aggregates, the solutions obtained by evaluation P have to > be grouped. The GROUP BY clause can itself contain expressions, e.g., > GROUP BY ((?x + ?y) AS ?z). Thus, we need an Extend algebra object > before we can group, which we do not consider in the current algebra > translation. > > Once we have extended the solutions with the group by clause > expressions, we can groupand then form the aggregate. One problem here > is that when we select a grouped variable, this variable is currently > replaced by a SAMPLE aggregate without assigning the aggregated value > back to the variable, so we get ?agg_i variables in the results. E.g., > from > SELECT ?x (AVG(?y) AS ?z) { ... } GROUP BY ?x > we first get > P=Group((?x), bgp(...)) > and the rewritten query > SELECT SAMPLE(?x) (AVG(?y) AS ?z) { ... } GROUP BY ?x > we then get > P=AggregateJoin( > Aggregation((?x), SAMPLE, {}, P), > Aggregation((?y), AVG {}, P) > ) > and the rewritten query > SELECT ?agg_1 (?agg_2 AS ?z) { ... } GROUP BY ?x > We then get > Extend(P, ?z, ?agg_2) > ?agg_1, however, remains as there is no AS ... in the SLECT clause. > > This is not nice and might even lead to problems, when I use ?x in the > HAVING clause as ?x is no longer mapped by any solution mapping. > > I see two possibilities to fix that: > 1) Replace a non-aggregated variable ?x with (SAMPLE(?x) AS ?x) > 2) There is not really a need to aggregate these variables at all, as > they are unique for each group (since only grouped variables can be > selected). thus, AggregateJoin(...) could be extended to construct the > solution mappings by including each grouped variable with its value > (as in the key for the aggregate) plues the aggregated values. > > > Thus, I believe, we have to have the following order in the algebra translation: > 1) query pattern translation as normal > 2) Translate GROUP expressions -> Extend(...) > 3) Group > 4) Aggregate (fixed to properly handle non-aggregated variables) > 5) Extend > 6) Filter (from HAVING) > Filter can only work after we have properly assigned the variables in step 5. > > In the current algebra transformation we omit 2), and have 5) and 6) swapped. > > Birte > > > -- > Jun. Prof. Dr. Birte Glimm Tel.: +49 731 50 24125 > Inst. of Artificial Intelligence Secr: +49 731 50 24258 > University of Ulm Fax: +49 731 50 24188 > D-89069 Ulm birte.glimm@uni-ulm.de > Germany > -- Steve Harris, CTO, Garlik Limited 1-3 Halford Road, Richmond, TW10 6AW, UK +44 20 8439 8203 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Tuesday, 15 November 2011 12:45:19 UTC