Re: SUM aggregate operator and non-numeric literals

Looks fine - good point that SUM can be extended rather than requiring 
it's formulation to be changed.


One observation (does not the reply being sent):

[[
 > The result will be always a type error.

Correct.
]]

I'm not sure what Jeen means here - the query does not produce a type 
error.  One of the two groups works, the other leads to an error in SUM 
but which caught in the SELECT expression and leads to an unbound variable.

	Andy


On 27/06/11 11:36, Steve Harris wrote:
> I've added http://www.w3.org/2009/sparql/wiki/CommentResponse:JB-6
>
> - Steve
>
> On 2011-06-27, at 11:02, Steve Harris wrote:
>
>> On 2011-06-26, at 16:43, Andy Seaborne wrote:
>>
>>> (can we have one x.y.z. section per aggregate like the functions so each has it's own doc link or links to the defintion of each?  The defs don't have anchors).
>>
>> Yes, good idea, I've added it to my todo list.
>>
>>> There are 3 points where customization can be done:
>>> 1/ The expression to be SUM'ed
>>> 2/ The aggregate called
>>> 3/ Whether SUM uses + or op:numeric-add
>>>
>>> 1 and 2 can already be done.
>>>
>>>
>>> For 1/:
>>>
>>> The effect Jeen uses as an example can be achieved by ensuring the multiset passed to SUM are all numbers:
>>>
>>> SUM(xsd:integer(?val))
>>
>> Right.
>>
>>> assuming that
>>>
>>> :a rdf:value "rst" is still to be an error for the group it's in.
>>>
>>> SUM(COALESCE(xsd:integer(?val),0))
>>>
>>> would give the effect of skip any error.
>>>
>>> Now, if the application does want an error, it is able to choose the expression.
>>>
>>> Another way, which copes with a wider range of numbers but without over-promotion of types:
>>>
>>> SUM(IF(isNumeric(?val), ?val , COALESCE(xsd:double(?val),0))
>>>
>>> For 2/:
>>>
>>> We do allow extension of aggregates via custom aggregates called by URI.  A different aggregation function operation would call a different operation.
>>>
>>> For 3/:
>>>
>>> 3/ A change to SUM so it uses "+"
>>>
>>> Doesn't that constrain how "+" can be extended?  I presume it must be commutative and associative so order of the SUM aggregation does not matter.
>>>
>>> There's nothing to stop "+" being extended so that xsd:dataTime + xsd:duration ->  xsd:dateTime
>>>
>>> http://www.w3.org/TR/xmlschema-2/#adding-durations-to-dateTimes
>>>
>>> SUM would need to say
>>> sum(durations and at most DT) ->  DT
>>> The result type of SUM is changed.
>>
>> Good point, though that's more a good example of why overloading is a bad idea, IMHO. :)
>>
>>> See also string + string.
>>
>> Quite.
>>
>> OK, I feel I have enough information to start drafting a response to Jeen now, I'll circulate it later today.
>>
>> - Steve
>>
>>> An error in an aggregation function in SELECT does not cause a row to be skipped - it becomes an unbound because of the SELECT expression.  An error in FILTER eliminates the row.
>>>
>>> On 26/06/11 09:00, Steve Harris wrote:
>>>> I think it was mostly just that at the time I wrote the def'n there was no (obvious, explicit) function for +, makes sense to change it to me.
>>>
>>> True - there isn't because dispatch by operator table isn't a function.  It could be - but it currently isn't.
>>>
>>> It does make the conditions for extending "+" a bit tricky though.
>>> What conditions are needed?
>>>
>>>    Andy
>>>
>>>>
>>>> - Steve
>>>>
>>>> On 2011-06-25, at 16:18, Lee Feigenbaum wrote:
>>>>
>>>>> On the surface, Jeen's reasoning makes sense to me.
>>>>>
>>>>> Steve, did we/you consider defining SUM instead of "+" instead of in terms of op:numeric-add?
>>>>>
>>>>> Lee
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: SUM aggregate operator and non-numeric literals
>>>>> Resent-Date: Thu, 23 Jun 2011 01:05:51 +0000
>>>>> Resent-From: public-rdf-dawg-comments@w3.org
>>>>> Date: Thu, 23 Jun 2011 13:05:10 +1200
>>>>> From: Jeen Broekstra<jeen.broekstra@gmail.com>
>>>>> To: public-rdf-dawg-comments@w3.org
>>>>>
>>>>>
>>>>> Hi DAWG,
>>>>>
>>>>> The current definition of SUM (section 18.4) is as follows :
>>>>>
>>>>> ==begin quote==
>>>>> Definition: Sum
>>>>> numeric Sum(multiset M)
>>>>>
>>>>> The Sum set function is used by the SUM aggregate in the syntax.
>>>>>
>>>>> Sum(M) = Sum(ToList(Flatten(M))).
>>>>>
>>>>> Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S]>   1
>>>>> Sum(S) = op:numeric-add(S1, 0) when card[S] = 1
>>>>> Sum(S) = 0 when card[S] = 0
>>>>>
>>>>> In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2,
>>>>> op:numeric-add(3, 0))).
>>>>> ==end quote==
>>>>>
>>>>> Given that the definition of SUM is directly in terms of the
>>>>> op:numeric-add XPath function, it follows that it can only be applied on
>>>>> numeric literals. Therefore, any SUM that aggregates over a set of
>>>>> values that contains a non-numeric type will result in a type error. Not
>>>>> even an extension of the SPARQL operator table in section 17.3 will
>>>>> help, as SUM is not defined in terms of those operators.
>>>>>
>>>>> In other words, if we have the following data:
>>>>>
>>>>> :a rdf:value "1" .
>>>>> :a rdf:value "2"^^xsd:integer .
>>>>> :b rdf:value "3"^^xsd:integer .
>>>>>
>>>>> And the following query:
>>>>>
>>>>> SELECT (SUM(?val) as ?value)
>>>>> WHERE {
>>>>>   ?a rdf:value ?val .
>>>>> } GROUP BY ?a
>>>>>
>>>>> The result will be always a type error.
>>>>>
>>>>> I would argue that having the same extensibility mechanisms available
>>>>> for SUM as we have for, for example, the + operator would be preferable.
>>>>> That way, implementations wanting to offer a more forgiving version of
>>>>> the SUM operator (one which silently ignores the non-numerics, for
>>>>> example), could do so while staying spec-compliant.
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jeen
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>> --
>> Steve Harris, CTO, Garlik Limited
>> 1-3 Halford Road, Richmond, TW10 6AW, UK
>> +44 20 8439 8203  http://www.garlik.com/
>> Registered in England and Wales 535 7233 VAT # 849 0517 11
>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>>
>>
>

Received on Monday, 27 June 2011 11:39:06 UTC