- From: Steve Harris <steve.harris@garlik.com>
- Date: Tue, 22 Nov 2011 12:50:00 +0000
- To: birte.glimm@uni-ulm.de
- Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
OK, I think I have a solution, which bridges the gap, and doesn't violate any scope rules: For each expression E in SELECT and each HAVING(E) in Q If E does not contain an aggregate Replace E with Sample(E) in Q End For each variable V appearing outside of an aggregate Ai := Aggregation(V, Sample, {}, G) Replace V with aggi in E Q := Extend(Q, V, aggi) i := i + 1 End For each aggregate X(args ; scalarvals) now in E # note scalarvals may be omitted, then it's equivalent to the empty set Ai := Aggregation(args, X, scalarvals, G) Replace X(...) with aggi in Q i := i + 1 End End So, SELECT ?x {?x :p ?v } GROUP BY ?x Becomes A=AggregateJoin(Aggregation((?x), SAMPLE, {}, A)) rewritten query: SELECT ?agg1 {?x :p ?v } GROUP BY ?x The the result is Extend(Project(A, {?agg1}), ?x, ?agg1) - Steve On 2011-11-21, at 21:26, Birte Glimm wrote: > On 21 November 2011 15:28, Andy Seaborne <andy.seaborne@epimorphics.com> wrote: >> Steve, >> >> Could you walk through the translation of: >> >> SELECT ?x {?x :p ?v } GROUP BY ?x > > Up to aggregation you get: > A=Group((?x), Bgp(?x :p ?v)) > rewritten query: > SELECT SAMPLE(?x) {?x :p ?v } GROUP BY ?x > Then > A=AggregateJoin(Aggregation((?x), SAMPLE, {}, A)) > rewritten query: > SELECT ?agg1 {?x :p ?v } GROUP BY ?x > Finally: > Project(A, {?agg1}) > > Now you evaluate and get results where ?agg1 has some binding, but ?x > is lost and Extend(...) is not used to extend the solutions so that ?x > gets the values that ?agg1 has. > > Birte > > >> This is the case where there is no AS because the variable is part of the >> group key. >> >> Andy >> >> On 17/11/11 09:00, Birte Glimm wrote: >>> >>> On 16 November 2011 19:08, Steve Harris<steve.harris@garlik.com> wrote: >>>> >>>> On 2011-11-16, at 16:58, Birte Glimm wrote: >>>> >>>>> [snip] >>>>>>>> >>>>>>>> As syntax, (SAMPLE(?x) AS ?x) isn't legal because AS has to introduce >>>>>>>> a new >>>>>>>> variable. This happens in SELECT expression processing a few >>>>>>>> subsections. >>>>>>> >>>>>>> Yes, that occured to me as well. Unless it is made legal for >>>>>>> intermediate queries, which I don't like, there seems no way around >>>>>>> creating solutions in the aggregate join that also contain the grouped >>>>>>> variables. >>>>>> >>>>>> Yes, that why I ended up with the messy agg_i thing, to avoid >>>>>> conflating aggregate results with variable names. >>>>>> >>>>>> Reading through this again, I think that the text as written is >>>>>> correct: >>>>>> >>>>>> For each variable V appearing outside of an aggregate >>>>>> Replace V with Sample(V) in Q >>>>>> End >>>>>> >>>>>> ensures that there's only aggregates being projected, then >>>>>> >>>>>> For each aggregate X(args ; scalarvals) now in E >>>>>> # note scalarvals may be omitted, then it's equivalent to the >>>>>> empty set >>>>>> Ai := Aggregation(args, X, scalarvals, G) >>>>>> Replace X(...) with aggi in Q >>>>>> i := i + 1 >>>>>> End >>>>>> >>>>>> Defines A_i/agg_i for the Sample(V) above. I could well have spec >>>>>> blindness though. >>>>> >>>>> That still does not solve the problem that you loose the original >>>>> variable name, so results will contain agg_i and even worse, if you >>>>> have a having clause, the variable used there might no longer exist >>>>> since it was replaced by agg_i. >>>> >>>> I believe that's taken care of by AggregateJoin: >>>> >>>> Write A = (A1, A2, ...) where Ai = Aggregation(exprListi, funci, >>>> scalarvarsi, P) >>>> >>>> eval(D(G), AggregateJoin(A)) = { (agg1, v1), ..., (aggn, vn) | vi such >>>> that ( k, vi ) in eval(D(G), Ai) >>>> for some k and each 1<= i<= n } >>>> >>>> vi is your var below, I think. >>> >>> I don't think. vi is the aggregated value, so you ( k, vi ) with k the >>> key that was used to group the values and vi the aggregated value. >>> I'll do one complete example to make clear where I think we have a >>> problem: >>> >>> Assume data: >>> ex:Birte ex:mark 4, 5 . >>> ex:Steve ex:mark 3, 5 . >>> and query: >>> SELECT ?name (MAX(?mark) AS ?max) (AVG(?mark) AS ?avg) >>> WHERE { ?name ex:mark ?mark } GROUP BY ?name >>> >>> Algebra translation up to group gives: >>> A=Group((?name), Bgp(...)) >>> Replace ?name with SAMPLE(?name) >>> SELECT SAMPLE(?name) (MAX(?mark) AS ?max) (AVG(?mark) AS ?avg) >>> WHERE { ?name ex:mark ?mark } GROUP BY ?name >>> continue transformation: >>> A= AggregateJoin( >>> Aggregation((?name), SAMPLE, {}, A), >>> Aggregation((?mark), MAX, {}, A), >>> Aggregation((?mark), AVG, {}, A) >>> ) >>> plus we rewrite the aggregates with aggi: >>> SELECT ?agg1 (?agg2 AS ?max) (?agg3 AS ?avg) >>> WHERE { ?name ex:mark ?mark } GROUP BY ?name >>> We then translate the expressions: >>> A=Extend(A, ?max, ?agg2) >>> A=Extend(A, ?avg, ?agg3) >>> Finally projection: >>> Projcet(A, {?agg1, ?max, ?avg}) >>> >>> Evaluation the BGP gives 4 solutions: >>> mu1 : ?name -> ex:Birte, ?mark -> 4 >>> mu2 : ?name -> ex:Birte, ?mark -> 5 >>> mu3 : ?name -> ex:Steve, ?mark -> 3 >>> mu4 : ?name -> ex:Steve ?mark -> 5 >>> After grouping we get: >>> { ex:Birte -> {mu1, mu2}, ex:Steve -> {mu3, mu4} } >>> The three aggregates give >>> SAMPLE: { (ex:Birte)->ex:Birte, (ex:Steve)->ex:Steve } >>> MAX: { (ex:Birte)->5, (ex:Steve)->5 } >>> AVG: { (ex:Birte)->4.5, (ex:Steve)->4 } >>> Joining the aggregates gives 2 new solutions: >>> mu_a: ?agg1->ex:Birte, ?agg2->5, ?agg3->4.5 >>> mu_b : ?agg1->ex:Steve, ?agg2->5, ?agg3->4 >>> We extend the solutions: >>> mu_a' = mu_a union ?max->5, ?avg->4.5 >>> mu_b' = mu_a union ?max->5, ?avg->4 >>> Project then throws out ?agg2 and ?agg3 and we are done. >>> >>> We have, however, ?agg1 instead of ?name and if we had used HAVING >>> with some condition on ?name we would have a problem. >>> >>> Do I do something wrong here or do we indeed have ?agg1 in the result >>> and ?name got lost? >>> >>> Birte >>> >>> >>> >>> >>> >>>> >>>> The way the document is structured moves these apart in an unfortunate >>>> way. >>>> >>>> - Steve >>>> >>>>> >>>>> How about having two loops >>>>> For each aggregate (X(args ; scalarvals) AS var) now in E >>>>> # note scalarvals may be omitted, then it's equivalent to the >>>>> empty set >>>>> Ai := Aggregation(args, X, scalarvals, G) >>>>> Replace X(...) with aggi in Q >>>>> i := i + 1 >>>>> End >>>>> For each aggregate X(args ; scalarvals) now in E >>>>> # note scalarvals may be omitted, then it's equivalent to the >>>>> empty set >>>>> Ai := Aggregation(args, X, scalarvals, G) >>>>> Replace X(var; scalarvals) with (aggi AS var) in Q >>>>> i := i + 1 >>>>> End >>>>> >>>>> This way, we never have an illegal syntax form, we guarantee that all >>>>> variables are still available after the aggregation and since AS is >>>>> only processed later all seems to be fine. One could of course think >>>>> about handling both cases in one loop although for the spec having two >>>>> loops seems fine to me. >>>>> >>>>> Birte >>>>> >>>>>> - Steve >>>>>> >>>>>>>> There is no definition of "Aggregation". It's mentioned in 11.2 but >>>>>>>> the >>>>>>>> link goes to "Definition: Evaluation of Aggregation". There should a >>>>>>>> definition (just after group?) in 18.4. >>>>>>> >>>>>>> Yes, I also wondered about that. It is somehow clear how to evaluate, >>>>>>> but it would be much more consistent if there were a definition. >>>>>>> >>>>>>>> I looked because I wondered if we could just have an "?x" as the >>>>>>>> "aggregate". >>>>>>> >>>>>>> Not sure I understand this. >>>>>>> >>>>>>>> But I think, as Birte shows, as because it's done by syntactic >>>>>>>> rewriting, just leaving it as "?x" would work. >>>>>>> >>>>>>> As I don't understand the sentence above. I just want to make my point >>>>>>> again that we need a binding for ?x if ?x is grouped but not in an >>>>>>> aggregate as it can be used in the HAVING clause. If, at the point of >>>>>>> evaluating HAVING, we only have agg_1, we can't filter on ?x. >>>>>>> >>>>>>>>> I wanted to convert the plain ?x projection to an aggregate so it >>>>>>>>> was >>>>>>>>> consistent with the rest of the projections, but expressing it >>>>>>>>> explicitly >>>>>>>>> would be equivalent I think. >>>>>>>>> >>>>>>>>> I will have a run through the aggregation text and see if I can make >>>>>>>>> that >>>>>>>>> change with a relatively small change to the document. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Steve >>>>>>>> >>>>>>>> I also noticed; >>>>>>>> >>>>>>>> [[ >>>>>>>> Definition: Evaluation of AggregateJoin >>>>>>>> ... >>>>>>>> Note that if eval(D(G), Ai) is an error, it is ignored. >>>>>>>> ]] >>>>>>>> >>>>>>>> An error causes an error doesn't it? (AS causes it to be unbound) >>>>>>> >>>>>>> AS is transformed into Extend(), which is evaluated: >>>>>>> Extend(μ, var, expr) = μ ∪ { (var,value) | var not in dom(μ) and value >>>>>>> = eval(expr) } >>>>>>> Extend(μ, var, expr) = μ if var not in dom(μ) and eval(expr) is an >>>>>>> error >>>>>>> >>>>>>> The latter makes the solution just not contain a mapping for the >>>>>>> variable as I understand it. >>>>>>> >>>>>>> But while we are at it, there is a lowercase extend in the Definition >>>>>>> of Extend: >>>>>>> Extend(Ω , var, term) = { extend(μ, var, term) | μ in Ω } >>>>>>> >>>>>>> It is also lowercase in the evaluation semantics: >>>>>>> Definition: Evaluation of Extend >>>>>>> eval(D(G), extend(var, expr, P)) = extend(var, expr , eval(D(G), P)) >>>>>>> Furthermore, here we swap the order. It should be >>>>>>> eval(D(G), Extend(P, var, expr)) = Extend(eval(D(G), P), var, expr) >>>>>>> or the algorithm for translating queries into the algrebra is wrong >>>>>>> and has to be changed. >>>>>>> >>>>>>> Birte >>>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jun. Prof. Dr. Birte Glimm Tel.: +49 731 50 24125 >>>>>>> Inst. of Artificial Intelligence Secr: +49 731 50 24258 >>>>>>> University of Ulm Fax: +49 731 50 24188 >>>>>>> D-89069 Ulm birte.glimm@uni-ulm.de >>>>>>> Germany >>>>>>> >>>>>> >>>>>> -- >>>>>> Steve Harris, CTO, Garlik Limited >>>>>> 1-3 Halford Road, Richmond, TW10 6AW, UK >>>>>> +44 20 8439 8203 http://www.garlik.com/ >>>>>> Registered in England and Wales 535 7233 VAT # 849 0517 11 >>>>>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 >>>>>> 9AD >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jun. Prof. Dr. Birte Glimm Tel.: +49 731 50 24125 >>>>> Inst. of Artificial Intelligence Secr: +49 731 50 24258 >>>>> University of Ulm Fax: +49 731 50 24188 >>>>> D-89069 Ulm birte.glimm@uni-ulm.de >>>>> Germany >>>>> >>>> >>>> -- >>>> Steve Harris, CTO, Garlik Limited >>>> 1-3 Halford Road, Richmond, TW10 6AW, UK >>>> +44 20 8439 8203 http://www.garlik.com/ >>>> Registered in England and Wales 535 7233 VAT # 849 0517 11 >>>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD >>>> >>>> >>> >>> >>> >> > > > > -- > Jun. Prof. Dr. Birte Glimm Tel.: +49 731 50 24125 > Inst. of Artificial Intelligence Secr: +49 731 50 24258 > University of Ulm Fax: +49 731 50 24188 > D-89069 Ulm birte.glimm@uni-ulm.de > Germany > -- Steve Harris, CTO, Garlik Limited 1-3 Halford Road, Richmond, TW10 6AW, UK +44 20 8439 8203 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Tuesday, 22 November 2011 12:50:50 UTC