Re: Semantics of SUM

On 12/11/2009 22:54, Steve Harris wrote:
> On 12 Nov 2009, at 20:10, Andy Seaborne wrote:
>>>> However, it does not quite follow the semantics of "+" because it
>>>> changes the datatype:
>>>>
>>>> SUM(?x | ?x{1,2,3} ) ==> "6"^^xsd:integer
>>
>> Notation was meant to be SUM(?x) over a group with ?x=1 ,?x=2, ?x=3
>> It was a bit terse - had been writing comprehensions just before hand
>> so it was obvious to me, at the time.
>
> Hm, yeah, I guessed that, but the rest I didn't follow. Makes sense now
> though.
>
>>>> but
>>>>
>>>> SUM(COALESCE(xsd:double(?x), 0) | ?x{1,2,3} ) => "6"^^xsd:double
>>
>> Using
>> SUM(COALESCE(xsd:double(?x), 0))
>> for group members ?x=1,2,3 (xsd:integers) results in xsd:double, not
>> xsd:integer as it would for 1+2+3
>
> No, sure, but COALESCE(?x, 0) would in this case.

But then we'd loose the strings->integers.

In SQL, the standard aggregates ignore nulls so my assumption has been 
that SPARQL would have been similar which I was assuming was ignore 
unbound.  Because evaluating ?x, when ?x is unbound, is an error, I 
assumed that meant ignore error was the consistent paradigm.

>
>>>> Using xsd:integer does not work e.g. 1, 1.5, 3 !=> "5.5"^^xsd:decimal
>>
>> Encoding the full XSD hierarchy to minimise the promotion as + used
>> natively does, and cope with errors/non-numbers might be:
>>
>> COALESCE(xsd:integer(?x), xsd:decimal(?x), xsd:double(?x), 0)
>>
>> which I don't think is a serious contender on practical grounds.
>
> No, but I don't imagine there are too many situations where users would
> want the exact semantics, do you have good examples?

Maybe not perfect semantics but adding integers should produce integers. 
  My experience is that users don't always see that it's the value that 
matter.

Promoting to doubles loose precision eventually.  Finance applications 
may care.

>> I think ending up with xsd:double, not xsd:integer, in the potential
>> present of errors, is significant so maybe a way to indicate that
>> errors be excluded might be a better. However, a plethora of options
>> is also bad design. Hmm.
>
> I think you could do that with COALESCE(?x), if none of the values
> passed to COALESCE are not type errors or bound, you get an unbound
> value returned, so it will be dropped. I think.

Dropped? But the semantics are to use + and "unbound+anything" is an 
error, which then invalidates the whole SUM.

Unbound and error need to be treated the same way because eval(unbound 
variable) is an error.  (BOUND(?x) is special - it only applies to 
variables).

If errors are just dropped in SUM, it all works out.  nulls are dropped 
in SQL SUM.

	Andy

>
> - Steve
>

Received on Friday, 13 November 2009 11:47:15 UTC