# Re: SUM aggregate operator and non-numeric literals

From: Steve Harris <steve.harris@garlik.com>
Date: Mon, 27 Jun 2011 13:20:20 +0100

Message-Id: <3375BFD7-3A1D-4274-8035-4F52CCEC436A@garlik.com>
To: Andy Seaborne <andy.seaborne@epimorphics.com>
```On 2011-06-27, at 12:38, Andy Seaborne wrote:

> Looks fine - good point that SUM can be extended rather than requiring it's formulation to be changed.
>
>
> One observation (does not the reply being sent):
>
> [[
> > The result will be always a type error.
>
> Correct.
> ]]
>
> I'm not sure what Jeen means here - the query does not produce a type error.  One of the two groups works, the other leads to an error in SUM but which caught in the SELECT expression and leads to an unbound variable.

I thought he was referring to the result of the Sum() set function, but I can see it's open to interpretation.

- Steve

> On 27/06/11 11:36, Steve Harris wrote:
>>
>> - Steve
>>
>> On 2011-06-27, at 11:02, Steve Harris wrote:
>>
>>> On 2011-06-26, at 16:43, Andy Seaborne wrote:
>>>
>>>> (can we have one x.y.z. section per aggregate like the functions so each has it's own doc link or links to the defintion of each?  The defs don't have anchors).
>>>
>>> Yes, good idea, I've added it to my todo list.
>>>
>>>> There are 3 points where customization can be done:
>>>> 1/ The expression to be SUM'ed
>>>> 2/ The aggregate called
>>>> 3/ Whether SUM uses + or op:numeric-add
>>>>
>>>> 1 and 2 can already be done.
>>>>
>>>>
>>>> For 1/:
>>>>
>>>> The effect Jeen uses as an example can be achieved by ensuring the multiset passed to SUM are all numbers:
>>>>
>>>> SUM(xsd:integer(?val))
>>>
>>> Right.
>>>
>>>> assuming that
>>>>
>>>> :a rdf:value "rst" is still to be an error for the group it's in.
>>>>
>>>> SUM(COALESCE(xsd:integer(?val),0))
>>>>
>>>> would give the effect of skip any error.
>>>>
>>>> Now, if the application does want an error, it is able to choose the expression.
>>>>
>>>> Another way, which copes with a wider range of numbers but without over-promotion of types:
>>>>
>>>> SUM(IF(isNumeric(?val), ?val , COALESCE(xsd:double(?val),0))
>>>>
>>>> For 2/:
>>>>
>>>> We do allow extension of aggregates via custom aggregates called by URI.  A different aggregation function operation would call a different operation.
>>>>
>>>> For 3/:
>>>>
>>>> 3/ A change to SUM so it uses "+"
>>>>
>>>> Doesn't that constrain how "+" can be extended?  I presume it must be commutative and associative so order of the SUM aggregation does not matter.
>>>>
>>>> There's nothing to stop "+" being extended so that xsd:dataTime + xsd:duration ->  xsd:dateTime
>>>>
>>>>
>>>> SUM would need to say
>>>> sum(durations and at most DT) ->  DT
>>>> The result type of SUM is changed.
>>>
>>> Good point, though that's more a good example of why overloading is a bad idea, IMHO. :)
>>>
>>>
>>> Quite.
>>>
>>> OK, I feel I have enough information to start drafting a response to Jeen now, I'll circulate it later today.
>>>
>>> - Steve
>>>
>>>> An error in an aggregation function in SELECT does not cause a row to be skipped - it becomes an unbound because of the SELECT expression.  An error in FILTER eliminates the row.
>>>>
>>>> On 26/06/11 09:00, Steve Harris wrote:
>>>>> I think it was mostly just that at the time I wrote the def'n there was no (obvious, explicit) function for +, makes sense to change it to me.
>>>>
>>>> True - there isn't because dispatch by operator table isn't a function.  It could be - but it currently isn't.
>>>>
>>>> It does make the conditions for extending "+" a bit tricky though.
>>>> What conditions are needed?
>>>>
>>>>   Andy
>>>>
>>>>>
>>>>> - Steve
>>>>>
>>>>> On 2011-06-25, at 16:18, Lee Feigenbaum wrote:
>>>>>
>>>>>> On the surface, Jeen's reasoning makes sense to me.
>>>>>>
>>>>>>
>>>>>> Lee
>>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: SUM aggregate operator and non-numeric literals
>>>>>> Resent-Date: Thu, 23 Jun 2011 01:05:51 +0000
>>>>>> Date: Thu, 23 Jun 2011 13:05:10 +1200
>>>>>> From: Jeen Broekstra<jeen.broekstra@gmail.com>
>>>>>>
>>>>>>
>>>>>> Hi DAWG,
>>>>>>
>>>>>> The current definition of SUM (section 18.4) is as follows :
>>>>>>
>>>>>> ==begin quote==
>>>>>> Definition: Sum
>>>>>> numeric Sum(multiset M)
>>>>>>
>>>>>> The Sum set function is used by the SUM aggregate in the syntax.
>>>>>>
>>>>>> Sum(M) = Sum(ToList(Flatten(M))).
>>>>>>
>>>>>> Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S]>   1
>>>>>> Sum(S) = op:numeric-add(S1, 0) when card[S] = 1
>>>>>> Sum(S) = 0 when card[S] = 0
>>>>>>
>>>>>> ==end quote==
>>>>>>
>>>>>> Given that the definition of SUM is directly in terms of the
>>>>>> op:numeric-add XPath function, it follows that it can only be applied on
>>>>>> numeric literals. Therefore, any SUM that aggregates over a set of
>>>>>> values that contains a non-numeric type will result in a type error. Not
>>>>>> even an extension of the SPARQL operator table in section 17.3 will
>>>>>> help, as SUM is not defined in terms of those operators.
>>>>>>
>>>>>> In other words, if we have the following data:
>>>>>>
>>>>>> :a rdf:value "1" .
>>>>>> :a rdf:value "2"^^xsd:integer .
>>>>>> :b rdf:value "3"^^xsd:integer .
>>>>>>
>>>>>> And the following query:
>>>>>>
>>>>>> SELECT (SUM(?val) as ?value)
>>>>>> WHERE {
>>>>>>  ?a rdf:value ?val .
>>>>>> } GROUP BY ?a
>>>>>>
>>>>>> The result will be always a type error.
>>>>>>
>>>>>> I would argue that having the same extensibility mechanisms available
>>>>>> for SUM as we have for, for example, the + operator would be preferable.
>>>>>> That way, implementations wanting to offer a more forgiving version of
>>>>>> the SUM operator (one which silently ignores the non-numerics, for
>>>>>> example), could do so while staying spec-compliant.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Jeen
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Steve Harris, CTO, Garlik Limited
>>> 1-3 Halford Road, Richmond, TW10 6AW, UK
>>> +44 20 8439 8203  http://www.garlik.com/
>>> Registered in England and Wales 535 7233 VAT # 849 0517 11
>>>
>>>
>>
>

--
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11