- From: Steve Harris <steve.harris@garlik.com>
- Date: Mon, 21 Nov 2011 15:08:18 +0000
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: birte.glimm@uni-ulm.de, SPARQL Working Group <public-rdf-dawg@w3.org>
Yeah, I'm trying to swap this stuff in. I believe Birte has demonstrated that it doesn't join up, I just need to find a solution that doesn't violate any of the other rules. - Steve On 2011-11-21, at 14:28, Andy Seaborne wrote: > Steve, > > Could you walk through the translation of: > > SELECT ?x {?x :p ?v } GROUP BY ?x > > This is the case where there is no AS because the variable is part of the group key. > > Andy > > On 17/11/11 09:00, Birte Glimm wrote: >> On 16 November 2011 19:08, Steve Harris<steve.harris@garlik.com> wrote: >>> On 2011-11-16, at 16:58, Birte Glimm wrote: >>> >>>> [snip] >>>>>>> As syntax, (SAMPLE(?x) AS ?x) isn't legal because AS has to introduce a new >>>>>>> variable. This happens in SELECT expression processing a few subsections. >>>>>> >>>>>> Yes, that occured to me as well. Unless it is made legal for >>>>>> intermediate queries, which I don't like, there seems no way around >>>>>> creating solutions in the aggregate join that also contain the grouped >>>>>> variables. >>>>> >>>>> Yes, that why I ended up with the messy agg_i thing, to avoid conflating aggregate results with variable names. >>>>> >>>>> Reading through this again, I think that the text as written is correct: >>>>> >>>>> For each variable V appearing outside of an aggregate >>>>> Replace V with Sample(V) in Q >>>>> End >>>>> >>>>> ensures that there's only aggregates being projected, then >>>>> >>>>> For each aggregate X(args ; scalarvals) now in E >>>>> # note scalarvals may be omitted, then it's equivalent to the empty set >>>>> Ai := Aggregation(args, X, scalarvals, G) >>>>> Replace X(...) with aggi in Q >>>>> i := i + 1 >>>>> End >>>>> >>>>> Defines A_i/agg_i for the Sample(V) above. I could well have spec blindness though. >>>> >>>> That still does not solve the problem that you loose the original >>>> variable name, so results will contain agg_i and even worse, if you >>>> have a having clause, the variable used there might no longer exist >>>> since it was replaced by agg_i. >>> >>> I believe that's taken care of by AggregateJoin: >>> >>> Write A = (A1, A2, ...) where Ai = Aggregation(exprListi, funci, scalarvarsi, P) >>> >>> eval(D(G), AggregateJoin(A)) = { (agg1, v1), ..., (aggn, vn) | vi such that ( k, vi ) in eval(D(G), Ai) >>> for some k and each 1<= i<= n } >>> >>> vi is your var below, I think. >> >> I don't think. vi is the aggregated value, so you ( k, vi ) with k the >> key that was used to group the values and vi the aggregated value. >> I'll do one complete example to make clear where I think we have a >> problem: >> >> Assume data: >> ex:Birte ex:mark 4, 5 . >> ex:Steve ex:mark 3, 5 . >> and query: >> SELECT ?name (MAX(?mark) AS ?max) (AVG(?mark) AS ?avg) >> WHERE { ?name ex:mark ?mark } GROUP BY ?name >> >> Algebra translation up to group gives: >> A=Group((?name), Bgp(...)) >> Replace ?name with SAMPLE(?name) >> SELECT SAMPLE(?name) (MAX(?mark) AS ?max) (AVG(?mark) AS ?avg) >> WHERE { ?name ex:mark ?mark } GROUP BY ?name >> continue transformation: >> A= AggregateJoin( >> Aggregation((?name), SAMPLE, {}, A), >> Aggregation((?mark), MAX, {}, A), >> Aggregation((?mark), AVG, {}, A) >> ) >> plus we rewrite the aggregates with aggi: >> SELECT ?agg1 (?agg2 AS ?max) (?agg3 AS ?avg) >> WHERE { ?name ex:mark ?mark } GROUP BY ?name >> We then translate the expressions: >> A=Extend(A, ?max, ?agg2) >> A=Extend(A, ?avg, ?agg3) >> Finally projection: >> Projcet(A, {?agg1, ?max, ?avg}) >> >> Evaluation the BGP gives 4 solutions: >> mu1 : ?name -> ex:Birte, ?mark -> 4 >> mu2 : ?name -> ex:Birte, ?mark -> 5 >> mu3 : ?name -> ex:Steve, ?mark -> 3 >> mu4 : ?name -> ex:Steve ?mark -> 5 >> After grouping we get: >> { ex:Birte -> {mu1, mu2}, ex:Steve -> {mu3, mu4} } >> The three aggregates give >> SAMPLE: { (ex:Birte)->ex:Birte, (ex:Steve)->ex:Steve } >> MAX: { (ex:Birte)->5, (ex:Steve)->5 } >> AVG: { (ex:Birte)->4.5, (ex:Steve)->4 } >> Joining the aggregates gives 2 new solutions: >> mu_a: ?agg1->ex:Birte, ?agg2->5, ?agg3->4.5 >> mu_b : ?agg1->ex:Steve, ?agg2->5, ?agg3->4 >> We extend the solutions: >> mu_a' = mu_a union ?max->5, ?avg->4.5 >> mu_b' = mu_a union ?max->5, ?avg->4 >> Project then throws out ?agg2 and ?agg3 and we are done. >> >> We have, however, ?agg1 instead of ?name and if we had used HAVING >> with some condition on ?name we would have a problem. >> >> Do I do something wrong here or do we indeed have ?agg1 in the result >> and ?name got lost? >> >> Birte >> >> >> >> >> >>> >>> The way the document is structured moves these apart in an unfortunate way. >>> >>> - Steve >>> >>>> >>>> How about having two loops >>>> For each aggregate (X(args ; scalarvals) AS var) now in E >>>> # note scalarvals may be omitted, then it's equivalent to the empty set >>>> Ai := Aggregation(args, X, scalarvals, G) >>>> Replace X(...) with aggi in Q >>>> i := i + 1 >>>> End >>>> For each aggregate X(args ; scalarvals) now in E >>>> # note scalarvals may be omitted, then it's equivalent to the empty set >>>> Ai := Aggregation(args, X, scalarvals, G) >>>> Replace X(var; scalarvals) with (aggi AS var) in Q >>>> i := i + 1 >>>> End >>>> >>>> This way, we never have an illegal syntax form, we guarantee that all >>>> variables are still available after the aggregation and since AS is >>>> only processed later all seems to be fine. One could of course think >>>> about handling both cases in one loop although for the spec having two >>>> loops seems fine to me. >>>> >>>> Birte >>>> >>>>> - Steve >>>>> >>>>>>> There is no definition of "Aggregation". It's mentioned in 11.2 but the >>>>>>> link goes to "Definition: Evaluation of Aggregation". There should a >>>>>>> definition (just after group?) in 18.4. >>>>>> >>>>>> Yes, I also wondered about that. It is somehow clear how to evaluate, >>>>>> but it would be much more consistent if there were a definition. >>>>>> >>>>>>> I looked because I wondered if we could just have an "?x" as the >>>>>>> "aggregate". >>>>>> >>>>>> Not sure I understand this. >>>>>> >>>>>>> But I think, as Birte shows, as because it's done by syntactic >>>>>>> rewriting, just leaving it as "?x" would work. >>>>>> >>>>>> As I don't understand the sentence above. I just want to make my point >>>>>> again that we need a binding for ?x if ?x is grouped but not in an >>>>>> aggregate as it can be used in the HAVING clause. If, at the point of >>>>>> evaluating HAVING, we only have agg_1, we can't filter on ?x. >>>>>> >>>>>>>> I wanted to convert the plain ?x projection to an aggregate so it was >>>>>>>> consistent with the rest of the projections, but expressing it explicitly >>>>>>>> would be equivalent I think. >>>>>>>> >>>>>>>> I will have a run through the aggregation text and see if I can make that >>>>>>>> change with a relatively small change to the document. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Steve >>>>>>> >>>>>>> I also noticed; >>>>>>> >>>>>>> [[ >>>>>>> Definition: Evaluation of AggregateJoin >>>>>>> ... >>>>>>> Note that if eval(D(G), Ai) is an error, it is ignored. >>>>>>> ]] >>>>>>> >>>>>>> An error causes an error doesn't it? (AS causes it to be unbound) >>>>>> >>>>>> AS is transformed into Extend(), which is evaluated: >>>>>> Extend(μ, var, expr) = μ ∪ { (var,value) | var not in dom(μ) and value >>>>>> = eval(expr) } >>>>>> Extend(μ, var, expr) = μ if var not in dom(μ) and eval(expr) is an error >>>>>> >>>>>> The latter makes the solution just not contain a mapping for the >>>>>> variable as I understand it. >>>>>> >>>>>> But while we are at it, there is a lowercase extend in the Definition of Extend: >>>>>> Extend(Ω , var, term) = { extend(μ, var, term) | μ in Ω } >>>>>> >>>>>> It is also lowercase in the evaluation semantics: >>>>>> Definition: Evaluation of Extend >>>>>> eval(D(G), extend(var, expr, P)) = extend(var, expr , eval(D(G), P)) >>>>>> Furthermore, here we swap the order. It should be >>>>>> eval(D(G), Extend(P, var, expr)) = Extend(eval(D(G), P), var, expr) >>>>>> or the algorithm for translating queries into the algrebra is wrong >>>>>> and has to be changed. >>>>>> >>>>>> Birte >>>>>>> Andy >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jun. Prof. Dr. Birte Glimm Tel.: +49 731 50 24125 >>>>>> Inst. of Artificial Intelligence Secr: +49 731 50 24258 >>>>>> University of Ulm Fax: +49 731 50 24188 >>>>>> D-89069 Ulm birte.glimm@uni-ulm.de >>>>>> Germany >>>>>> >>>>> >>>>> -- >>>>> Steve Harris, CTO, Garlik Limited >>>>> 1-3 Halford Road, Richmond, TW10 6AW, UK >>>>> +44 20 8439 8203 http://www.garlik.com/ >>>>> Registered in England and Wales 535 7233 VAT # 849 0517 11 >>>>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Jun. Prof. Dr. Birte Glimm Tel.: +49 731 50 24125 >>>> Inst. of Artificial Intelligence Secr: +49 731 50 24258 >>>> University of Ulm Fax: +49 731 50 24188 >>>> D-89069 Ulm birte.glimm@uni-ulm.de >>>> Germany >>>> >>> >>> -- >>> Steve Harris, CTO, Garlik Limited >>> 1-3 Halford Road, Richmond, TW10 6AW, UK >>> +44 20 8439 8203 http://www.garlik.com/ >>> Registered in England and Wales 535 7233 VAT # 849 0517 11 >>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD >>> >>> >> >> >> > -- Steve Harris, CTO, Garlik Limited 1-3 Halford Road, Richmond, TW10 6AW, UK +44 20 8439 8203 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Monday, 21 November 2011 15:09:08 UTC