Re: Order of evaluation for aggregates from Steve Harris on 2011-11-16 (public-rdf-dawg@w3.org from October to December 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 16 Nov 2011 18:08:29 +0000
To: birte.glimm@uni-ulm.de
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <0691F288-DD31-4D95-8F60-FB6861D55B35@garlik.com>
On 2011-11-16, at 16:58, Birte Glimm wrote:

> [snip]
>>>> As syntax, (SAMPLE(?x) AS ?x) isn't legal because AS has to introduce a new
>>>> variable. This happens in SELECT expression processing a few subsections.
>>> 
>>> Yes, that occured to me as well. Unless it is made legal for
>>> intermediate queries, which I don't like, there seems no way around
>>> creating solutions in the aggregate join that also contain the grouped
>>> variables.
>> 
>> Yes, that why I ended up with the messy agg_i thing, to avoid conflating aggregate results with variable names.
>> 
>> Reading through this again, I think that the text as written is correct:
>> 
>>    For each variable V appearing outside of an aggregate
>>        Replace V with Sample(V) in Q
>>        End
>> 
>> ensures that there's only aggregates being projected, then
>> 
>>    For each aggregate X(args ; scalarvals) now in E
>>        # note scalarvals may be omitted, then it's equivalent to the empty set
>>        Ai := Aggregation(args, X, scalarvals, G)
>>        Replace X(...) with aggi in Q
>>        i := i + 1
>>        End
>> 
>> Defines A_i/agg_i for the Sample(V) above. I could well have spec blindness though.
> 
> That still does not solve the problem that you loose the original
> variable name, so results will contain agg_i and even worse, if you
> have a having clause, the variable used there might no longer exist
> since it was replaced by agg_i.

I believe that's taken care of by AggregateJoin:

Write A = (A1, A2, ...) where Ai = Aggregation(exprListi, funci, scalarvarsi, P)

eval(D(G), AggregateJoin(A)) = { (agg1, v1), ..., (aggn, vn) | vi such that ( k, vi ) in eval(D(G), Ai)
for some k and each 1 <= i <= n }

vi is your var below, I think.

The way the document is structured moves these apart in an unfortunate way.

- Steve

> 
> How about having two loops
> For each aggregate (X(args ; scalarvals) AS var) now in E
>        # note scalarvals may be omitted, then it's equivalent to the empty set
>        Ai := Aggregation(args, X, scalarvals, G)
>        Replace X(...) with aggi in Q
>        i := i + 1
>        End
> For each aggregate X(args ; scalarvals) now in E
>        # note scalarvals may be omitted, then it's equivalent to the empty set
>        Ai := Aggregation(args, X, scalarvals, G)
>       Replace X(var; scalarvals) with (aggi AS var) in Q
>        i := i + 1
>        End
> 
> This way, we never have an illegal syntax form, we guarantee that all
> variables are still available after the aggregation and since AS is
> only processed later all seems to be fine. One could of course think
> about handling both cases in one loop although for the spec having two
> loops seems fine to me.
> 
> Birte
> 
>> - Steve
>> 
>>>> There is no definition of "Aggregation".  It's mentioned in 11.2 but the
>>>> link goes to "Definition: Evaluation of Aggregation".  There should a
>>>> definition (just after group?) in 18.4.
>>> 
>>> Yes, I also wondered about that. It is somehow clear how to evaluate,
>>> but it would be much more consistent if there were a definition.
>>> 
>>>> I looked because I wondered if we could just have an "?x" as the
>>>> "aggregate".
>>> 
>>> Not sure I understand this.
>>> 
>>>> But I think, as Birte shows, as because it's done by syntactic
>>>> rewriting, just leaving it as "?x" would work.
>>> 
>>> As I don't understand the sentence above. I just want to make my point
>>> again that we need a binding for ?x if ?x is grouped but not in an
>>> aggregate as it can be used in the HAVING clause. If, at the point of
>>> evaluating HAVING, we only have agg_1, we can't filter on ?x.
>>> 
>>>>> I wanted to convert the plain ?x projection to an aggregate so it was
>>>>> consistent with the rest of the projections, but expressing it explicitly
>>>>> would be equivalent I think.
>>>>> 
>>>>> I will have a run through the aggregation text and see if I can make that
>>>>> change with a relatively small change to the document.
>>>>> 
>>>>> Cheers,
>>>>>    Steve
>>>> 
>>>> I also noticed;
>>>> 
>>>> [[
>>>> Definition: Evaluation of AggregateJoin
>>>> ...
>>>> Note that if eval(D(G), Ai) is an error, it is ignored.
>>>> ]]
>>>> 
>>>>  An error causes an error doesn't it?  (AS causes it to be unbound)
>>> 
>>> AS is transformed into Extend(), which is evaluated:
>>> Extend(μ, var, expr) = μ ∪ { (var,value) | var not in dom(μ) and value
>>> = eval(expr) }
>>> Extend(μ, var, expr) = μ if var not in dom(μ) and eval(expr) is an error
>>> 
>>> The latter makes the solution just not contain a mapping for the
>>> variable as I understand it.
>>> 
>>> But while we are at it, there is a lowercase extend in the Definition of Extend:
>>> Extend(Ω , var, term) = { extend(μ, var, term) | μ in Ω }
>>> 
>>> It is also lowercase in the evaluation semantics:
>>> Definition: Evaluation of Extend
>>> eval(D(G), extend(var, expr, P)) = extend(var, expr , eval(D(G), P))
>>> Furthermore, here we swap the order. It should be
>>> eval(D(G), Extend(P, var, expr)) = Extend(eval(D(G), P), var, expr)
>>> or the algorithm for translating queries into the algrebra is wrong
>>> and has to be changed.
>>> 
>>> Birte
>>>>        Andy
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
>>> Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
>>> University of Ulm                         Fax:   +49 731 50 24188
>>> D-89069 Ulm                               birte.glimm@uni-ulm.de
>>> Germany
>>> 
>> 
>> --
>> Steve Harris, CTO, Garlik Limited
>> 1-3 Halford Road, Richmond, TW10 6AW, UK
>> +44 20 8439 8203  http://www.garlik.com/
>> Registered in England and Wales 535 7233 VAT # 849 0517 11
>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>> 
>> 
> 
> 
> 
> -- 
> Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
> Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
> University of Ulm                         Fax:   +49 731 50 24188
> D-89069 Ulm                               birte.glimm@uni-ulm.de
> Germany
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 16 November 2011 18:08:59 UTC