Evaluation of aggregates (was: Re: isNumeric) from Andy Seaborne on 2010-09-20 (public-rdf-dawg@w3.org from July to September 2010)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 20 Sep 2010 11:09:11 +0100
To: Axel Polleres <axel.polleres@deri.org>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4C9732C7.5000604@epimorphics.com>

On 20/09/10 09:55, Axel Polleres wrote:
> On 20 Sep 2010, at 10:26, Andy Seaborne wrote:
>> On 19/09/10 21:13, Axel Polleres wrote:
>>> Hi all,
>>>
>>> Don't we need something like a function isNumeric to test for a numeric argument?
>>> Seems to be handy, useful for instance for numeric aggregates, or no?
>>>
>>> It seems COALESCE together with a cast would work as well, but
>>> something like
>>>     SUM(IF(isNumeric(?X), ?X, 0))
>>> looks better - at least to me - than:
>>>       SUM(COALESCE(xs:double(?X) , 0))
>>>
>>> Opinions?
>>>
>>> Axel
>>
>> I believe we eventually agreed that sum() would skip any evaluation
>> errors of the summation as being more consistent in style for SPARQL.
>
> Hmmm, I read up in the F2F minutes
> http://www.w3.org/2009/sparql/meeting/2010-03-26#SUM
> and couldn't find it there... according to the current definition SUM just delegates to op:numeric-add
> which would error on non-numeric values, wouldn't it?
>
> Note that Resolution http://www.w3.org/2009/sparql/meeting/2010-03-26#resolution_2
> doesn't cover that, since the argument passed is not an error, but just
> the wrong datatype.
>
> We might have that resolved otherwise somewhere, but frankly I can't find that.
>
>> It
>> makes it more consistent since you can always do:
>>FILTER(false) }
>> sum(?X+0)
>
>> to make the check happen before the aggregate is called.
> FILTER(false) }
> What does that change? if X is a string, then ?X + 0 gives an error, doesn't it?
> (at least it did when I tried it with XQuery)

It changes where the error occurs.  For "+0" the error occurs the 
aggregation step and that one error is skipped by EvaListE.

Otherwise it passes to op:numeric-add where it's a propagating error and 
the whole group is an error.

I find that inconsistent.

>> We do not
>> define any situation that causes a runtime evaluation error in SPARQL
>> (systems may issue warnings or reject the query, of course).
>
>
> I thought we throw away the whole row, i.e. leave ?P undbound for
> (Sum(?Pr) AS ?P) if the flattened op:numeric-add returns an error, at least
> that's how I'd read the current text for  Definition: Extend along with the default behavior of
> op:numeric-add . At least, we'd need to throw away non-numerics along with Flatten, or something like that, yes?

If it's an error, it's just that variable binding that is not set, the 
row remains.  It's clearer to explicitly define the error case in sum() 
and making it like ?x+0 seems natural to me.

Just define:
Sum(S) = op:numeric-add(S0, Sum(S1..n)) when |S| > 1 if S0 is a number
Sum(S) = op:numeric-add(0, Sum(S1..n)) when |S| > 1 if S0 is not a number

BTW: sum({"string"}) is defined as "string" currently :-) because:

Sum(S) = S0 when |S| = 1
   instead:
Sum(S) = S0 when |S| = 1 if S0 is a number
Sum(S) = 0 when |S| = 1 if S0 is not a number


There are also two cases where aggregation of the empty sets occurs: the 
min, max and sample aggregates don't cover this.
SELECT (AGG(?s) AS ?a) { FILTER(false) }
SELECT (AGG(?s) AS ?a)
{
    ?x :a :type
    OPTIONAL { ?x :q ?s . FILTER(false) }
} GROUP BY ?x

I have implemented:

min -> unbound
max -> unbound
sample -> unbound

	Andy

Received on Monday, 20 September 2010 10:09:47 UTC