Re: SUM aggregate operator and non-numeric literals

I've added http://www.w3.org/2009/sparql/wiki/CommentResponse:JB-6

- Steve

On 2011-06-27, at 11:02, Steve Harris wrote:

> On 2011-06-26, at 16:43, Andy Seaborne wrote:
> 
>> (can we have one x.y.z. section per aggregate like the functions so each has it's own doc link or links to the defintion of each?  The defs don't have anchors).
> 
> Yes, good idea, I've added it to my todo list.
> 
>> There are 3 points where customization can be done:
>> 1/ The expression to be SUM'ed
>> 2/ The aggregate called
>> 3/ Whether SUM uses + or op:numeric-add
>> 
>> 1 and 2 can already be done.
>> 
>> 
>> For 1/:
>> 
>> The effect Jeen uses as an example can be achieved by ensuring the multiset passed to SUM are all numbers:
>> 
>> SUM(xsd:integer(?val))
> 
> Right.
> 
>> assuming that
>> 
>> :a rdf:value "rst" is still to be an error for the group it's in.
>> 
>> SUM(COALESCE(xsd:integer(?val),0))
>> 
>> would give the effect of skip any error.
>> 
>> Now, if the application does want an error, it is able to choose the expression.
>> 
>> Another way, which copes with a wider range of numbers but without over-promotion of types:
>> 
>> SUM(IF(isNumeric(?val), ?val , COALESCE(xsd:double(?val),0))
>> 
>> For 2/:
>> 
>> We do allow extension of aggregates via custom aggregates called by URI.  A different aggregation function operation would call a different operation.
>> 
>> For 3/:
>> 
>> 3/ A change to SUM so it uses "+"
>> 
>> Doesn't that constrain how "+" can be extended?  I presume it must be commutative and associative so order of the SUM aggregation does not matter.
>> 
>> There's nothing to stop "+" being extended so that xsd:dataTime + xsd:duration -> xsd:dateTime
>> 
>> http://www.w3.org/TR/xmlschema-2/#adding-durations-to-dateTimes
>> 
>> SUM would need to say
>> sum(durations and at most DT) -> DT
>> The result type of SUM is changed.
> 
> Good point, though that's more a good example of why overloading is a bad idea, IMHO. :)
> 
>> See also string + string.
> 
> Quite.
> 
> OK, I feel I have enough information to start drafting a response to Jeen now, I'll circulate it later today.
> 
> - Steve
> 
>> An error in an aggregation function in SELECT does not cause a row to be skipped - it becomes an unbound because of the SELECT expression.  An error in FILTER eliminates the row.
>> 
>> On 26/06/11 09:00, Steve Harris wrote:
>>> I think it was mostly just that at the time I wrote the def'n there was no (obvious, explicit) function for +, makes sense to change it to me.
>> 
>> True - there isn't because dispatch by operator table isn't a function.  It could be - but it currently isn't.
>> 
>> It does make the conditions for extending "+" a bit tricky though.
>> What conditions are needed?
>> 
>>   Andy
>> 
>>> 
>>> - Steve
>>> 
>>> On 2011-06-25, at 16:18, Lee Feigenbaum wrote:
>>> 
>>>> On the surface, Jeen's reasoning makes sense to me.
>>>> 
>>>> Steve, did we/you consider defining SUM instead of "+" instead of in terms of op:numeric-add?
>>>> 
>>>> Lee
>>>> 
>>>> -------- Original Message --------
>>>> Subject: SUM aggregate operator and non-numeric literals
>>>> Resent-Date: Thu, 23 Jun 2011 01:05:51 +0000
>>>> Resent-From: public-rdf-dawg-comments@w3.org
>>>> Date: Thu, 23 Jun 2011 13:05:10 +1200
>>>> From: Jeen Broekstra<jeen.broekstra@gmail.com>
>>>> To: public-rdf-dawg-comments@w3.org
>>>> 
>>>> 
>>>> Hi DAWG,
>>>> 
>>>> The current definition of SUM (section 18.4) is as follows :
>>>> 
>>>> ==begin quote==
>>>> Definition: Sum
>>>> numeric Sum(multiset M)
>>>> 
>>>> The Sum set function is used by the SUM aggregate in the syntax.
>>>> 
>>>> Sum(M) = Sum(ToList(Flatten(M))).
>>>> 
>>>> Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S]>  1
>>>> Sum(S) = op:numeric-add(S1, 0) when card[S] = 1
>>>> Sum(S) = 0 when card[S] = 0
>>>> 
>>>> In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2,
>>>> op:numeric-add(3, 0))).
>>>> ==end quote==
>>>> 
>>>> Given that the definition of SUM is directly in terms of the
>>>> op:numeric-add XPath function, it follows that it can only be applied on
>>>> numeric literals. Therefore, any SUM that aggregates over a set of
>>>> values that contains a non-numeric type will result in a type error. Not
>>>> even an extension of the SPARQL operator table in section 17.3 will
>>>> help, as SUM is not defined in terms of those operators.
>>>> 
>>>> In other words, if we have the following data:
>>>> 
>>>> :a rdf:value "1" .
>>>> :a rdf:value "2"^^xsd:integer .
>>>> :b rdf:value "3"^^xsd:integer .
>>>> 
>>>> And the following query:
>>>> 
>>>> SELECT (SUM(?val) as ?value)
>>>> WHERE {
>>>>  ?a rdf:value ?val .
>>>> } GROUP BY ?a
>>>> 
>>>> The result will be always a type error.
>>>> 
>>>> I would argue that having the same extensibility mechanisms available
>>>> for SUM as we have for, for example, the + operator would be preferable.
>>>> That way, implementations wanting to offer a more forgiving version of
>>>> the SUM operator (one which silently ignores the non-numerics, for
>>>> example), could do so while staying spec-compliant.
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>>> Jeen
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> -- 
> Steve Harris, CTO, Garlik Limited
> 1-3 Halford Road, Richmond, TW10 6AW, UK
> +44 20 8439 8203  http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Monday, 27 June 2011 10:37:01 UTC