Re: Nested Aggregate Expressions from Andy Seaborne on 2012-06-05 (public-rdf-dawg@w3.org from April to June 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 05 Jun 2012 19:45:43 +0100
To: Steve Harris <steve.harris@garlik.com>
CC: birte.glimm@uni-ulm.de, public-rdf-dawg@w3.org
Message-ID: <4FCE53D7.9030104@epimorphics.com>
On 05/06/12 19:25, Steve Harris wrote:
> On 5 Jun 2012, at 08:00, Birte Glimm wrote:
>
>> [snip]
>>>>> One line of argument is that the expression inside the aggregate is
>>>>> applied to each row, so only row variables should be considered in-scope.
>>>>>   The aggregate AVG(max(?x)+1) is violating that as max(?x) is not a per-row
>>>>> expression.
>>>
>>> (Birte) - yes this needs clarifying if we wish to rule it out, and possible
>>> even if we don't.
>>>
>>> As the spec stands, I *think* it says its not allowed:
>>>
>>> [[
>>> Definition Group:
>>>
>>> Group evaluates a list of expressions against a solution sequence
>>> ...
>>> ]]
>>>
>>> and the solution sequence is the grouped patterns, not after aggregation or
>>> select expressions.
>>>
>>> [[
>>> Definition: Aggregation
>>> ]]
>>> talks about applying the aggregate to the solution sequences collected into
>>> a map of key to multiset.
>>>
>>> i.e. the aggregate is evaluated over the pattern, not other aggregates and
>>> not select expressions
>>
>> I think the definition just cannot handle the current case, but it is
>> not forbidden, just undefined. IMO, either the definition has to be
>> extended or the current case has to be forbidden. Maybe it is illegal
>> due to some hidden constraints, but that should be made exlicit.
>
> Well, another option is to make it (explicitly) undefined. ANSI C does that a lot.


(details matter here :-)

Birte - Could you explain? - I don't see how it can be undefined, even 
if it's unclear, because the definitions look complete to me.

No new solution sequence is constructed until AggregateJoin is done and 
that's after the calculation of all the aggregates.  Therefore, the only 
solution sequences available to the aggregation functions are the 
grouped rows.

The definition of 'Aggregation'

[[
let { key1→Ω1, ..., keym→Ωm } be a multiset of partial functions from 
keys to solution sequences as produced by the grouping step.
]]
so Ωi is a multiset of bindings from the grouping step.


then:
[[
Aggregation(exprlist, func, scalarvals, { key1→Ω1, ..., keym→Ωm } )
    = { (key, F(Ω)) | key → Ω in { key1→Ω1, ..., keym→Ωm } }
]]

and the aggregate function (F) is called on the group partitions.

(but there is a gotcha though: avg(X) is defined as sum(X)/count(X))

 Andy


>
> - Steve
>
>>>>>
>>>>> What ARQ does is to calculate the aggregates of a group as the group
>>>>> streams past; it does not wait until the end of evaluation of the whole
>>>>> block when all the elements of all the groups are known.
>>>>>
>>>>>
>>>>> Related to this is the interaction with select expressions:
>>>>>
>>>>> SELECT (max(?x) As ?M) (avg(?M+1) AS ?A)
>>>>>
>>>>> because the select expression rules say you can use ?M inside AVG().
>>>>>
>>>>> If we wish to forbid this, we can do it quite easily by having a parser
>>>>> rule that aggregates can't appear in expression for the aggregate, which is
>>>>> a simple static check.
>>>>
>>>>
>>>> Oh boy, it's certainly wacky.
>>>>
>>>> That parse rule wouldn't rule out the use of ?M above though anyway, would
>>>> it?
>>>
>>>
>>> Complicated :-)
>>>
>>> As I read the spec, the ?Ms are different.
>>>
>>> (max(?x) As ?M) -- select expression
>>>
>>> avg(?M+1) -- undefined variable in the grouped pattern that is never
>>> mentioned or bound.
>>>
>>> Like writing
>>>
>>> avg(?noSuchVariable+1)
>>>
>>> -----
>>>
>>> Turning this round:
>>>
>>> Does any one have a use case that suggests it should be legal?
>>>
>>>         Andy
>>>
>>>>
>>>> - Steve
>>>>
>>>
>>
>>
>>
>> --
>> Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
>> Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
>> University of Ulm                         Fax:   +49 731 50 24188
>> D-89069 Ulm                               birte.glimm@uni-ulm.de
>> Germany
>>
>
Received on Tuesday, 5 June 2012 18:48:46 UTC