Re: Order of evaluation for aggregates from Birte Glimm on 2011-11-16 (public-rdf-dawg@w3.org from October to December 2011)

From: Birte Glimm <birte.glimm@uni-ulm.de>
Date: Wed, 16 Nov 2011 17:58:03 +0100
To: Steve Harris <steve.harris@garlik.com>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <CABt65Oc2x5B5H7rGf49+t5cPipQHWzVF877+b65Djteeq2cb6g@mail.gmail.com>
[snip]
>>> As syntax, (SAMPLE(?x) AS ?x) isn't legal because AS has to introduce a new
>>> variable. This happens in SELECT expression processing a few subsections.
>>
>> Yes, that occured to me as well. Unless it is made legal for
>> intermediate queries, which I don't like, there seems no way around
>> creating solutions in the aggregate join that also contain the grouped
>> variables.
>
> Yes, that why I ended up with the messy agg_i thing, to avoid conflating aggregate results with variable names.
>
> Reading through this again, I think that the text as written is correct:
>
>    For each variable V appearing outside of an aggregate
>        Replace V with Sample(V) in Q
>        End
>
> ensures that there's only aggregates being projected, then
>
>    For each aggregate X(args ; scalarvals) now in E
>        # note scalarvals may be omitted, then it's equivalent to the empty set
>        Ai := Aggregation(args, X, scalarvals, G)
>        Replace X(...) with aggi in Q
>        i := i + 1
>        End
>
> Defines A_i/agg_i for the Sample(V) above. I could well have spec blindness though.

That still does not solve the problem that you loose the original
variable name, so results will contain agg_i and even worse, if you
have a having clause, the variable used there might no longer exist
since it was replaced by agg_i.

How about having two loops
For each aggregate (X(args ; scalarvals) AS var) now in E
        # note scalarvals may be omitted, then it's equivalent to the empty set
        Ai := Aggregation(args, X, scalarvals, G)
        Replace X(...) with aggi in Q
        i := i + 1
        End
For each aggregate X(args ; scalarvals) now in E
        # note scalarvals may be omitted, then it's equivalent to the empty set
        Ai := Aggregation(args, X, scalarvals, G)
       Replace X(var; scalarvals) with (aggi AS var) in Q
        i := i + 1
        End

This way, we never have an illegal syntax form, we guarantee that all
variables are still available after the aggregation and since AS is
only processed later all seems to be fine. One could of course think
about handling both cases in one loop although for the spec having two
loops seems fine to me.

Birte

> - Steve
>
>>> There is no definition of "Aggregation".  It's mentioned in 11.2 but the
>>> link goes to "Definition: Evaluation of Aggregation".  There should a
>>> definition (just after group?) in 18.4.
>>
>> Yes, I also wondered about that. It is somehow clear how to evaluate,
>> but it would be much more consistent if there were a definition.
>>
>>> I looked because I wondered if we could just have an "?x" as the
>>> "aggregate".
>>
>> Not sure I understand this.
>>
>>> But I think, as Birte shows, as because it's done by syntactic
>>> rewriting, just leaving it as "?x" would work.
>>
>> As I don't understand the sentence above. I just want to make my point
>> again that we need a binding for ?x if ?x is grouped but not in an
>> aggregate as it can be used in the HAVING clause. If, at the point of
>> evaluating HAVING, we only have agg_1, we can't filter on ?x.
>>
>>>> I wanted to convert the plain ?x projection to an aggregate so it was
>>>> consistent with the rest of the projections, but expressing it explicitly
>>>> would be equivalent I think.
>>>>
>>>> I will have a run through the aggregation text and see if I can make that
>>>> change with a relatively small change to the document.
>>>>
>>>> Cheers,
>>>>    Steve
>>>
>>> I also noticed;
>>>
>>> [[
>>> Definition: Evaluation of AggregateJoin
>>> ...
>>> Note that if eval(D(G), Ai) is an error, it is ignored.
>>> ]]
>>>
>>>  An error causes an error doesn't it?  (AS causes it to be unbound)
>>
>> AS is transformed into Extend(), which is evaluated:
>> Extend(μ, var, expr) = μ ∪ { (var,value) | var not in dom(μ) and value
>> = eval(expr) }
>> Extend(μ, var, expr) = μ if var not in dom(μ) and eval(expr) is an error
>>
>> The latter makes the solution just not contain a mapping for the
>> variable as I understand it.
>>
>> But while we are at it, there is a lowercase extend in the Definition of Extend:
>> Extend(Ω , var, term) = { extend(μ, var, term) | μ in Ω }
>>
>> It is also lowercase in the evaluation semantics:
>> Definition: Evaluation of Extend
>> eval(D(G), extend(var, expr, P)) = extend(var, expr , eval(D(G), P))
>> Furthermore, here we swap the order. It should be
>> eval(D(G), Extend(P, var, expr)) = Extend(eval(D(G), P), var, expr)
>> or the algorithm for translating queries into the algrebra is wrong
>> and has to be changed.
>>
>> Birte
>>>        Andy
>>>
>>
>>
>>
>> --
>> Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
>> Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
>> University of Ulm                         Fax:   +49 731 50 24188
>> D-89069 Ulm                               birte.glimm@uni-ulm.de
>> Germany
>>
>
> --
> Steve Harris, CTO, Garlik Limited
> 1-3 Halford Road, Richmond, TW10 6AW, UK
> +44 20 8439 8203  http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>
>



-- 
Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
University of Ulm                         Fax:   +49 731 50 24188
D-89069 Ulm                               birte.glimm@uni-ulm.de
Germany
Received on Wednesday, 16 November 2011 16:58:45 UTC