W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > April to June 2011

Re: SUM aggregate operator and non-numeric literals

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Sun, 26 Jun 2011 16:43:38 +0100
Message-ID: <4E0753AA.6020204@epimorphics.com>
To: public-rdf-dawg@w3.org
(can we have one x.y.z. section per aggregate like the functions so each 
has it's own doc link or links to the defintion of each?  The defs don't 
have anchors).


There are 3 points where customization can be done:
1/ The expression to be SUM'ed
2/ The aggregate called
3/ Whether SUM uses + or op:numeric-add

1 and 2 can already be done.


For 1/:

The effect Jeen uses as an example can be achieved by ensuring the 
multiset passed to SUM are all numbers:

   SUM(xsd:integer(?val))

assuming that

:a rdf:value "rst" is still to be an error for the group it's in.

   SUM(COALESCE(xsd:integer(?val),0))

would give the effect of skip any error.

Now, if the application does want an error, it is able to choose the 
expression.

Another way, which copes with a wider range of numbers but without 
over-promotion of types:

SUM(IF(isNumeric(?val), ?val , COALESCE(xsd:double(?val),0))

For 2/:

We do allow extension of aggregates via custom aggregates called by URI. 
  A different aggregation function operation would call a different 
operation.

For 3/:

3/ A change to SUM so it uses "+"

Doesn't that constrain how "+" can be extended?  I presume it must be 
commutative and associative so order of the SUM aggregation does not matter.

There's nothing to stop "+" being extended so that xsd:dataTime + 
xsd:duration -> xsd:dateTime

http://www.w3.org/TR/xmlschema-2/#adding-durations-to-dateTimes

SUM would need to say
   sum(durations and at most DT) -> DT
The result type of SUM is changed.

See also string + string.

An error in an aggregation function in SELECT does not cause a row to be 
skipped - it becomes an unbound because of the SELECT expression.  An 
error in FILTER eliminates the row.

On 26/06/11 09:00, Steve Harris wrote:
> I think it was mostly just that at the time I wrote the def'n there was no (obvious, explicit) function for +, makes sense to change it to me.

True - there isn't because dispatch by operator table isn't a function. 
  It could be - but it currently isn't.

It does make the conditions for extending "+" a bit tricky though.
What conditions are needed?

     Andy

>
> - Steve
>
> On 2011-06-25, at 16:18, Lee Feigenbaum wrote:
>
>> On the surface, Jeen's reasoning makes sense to me.
>>
>> Steve, did we/you consider defining SUM instead of "+" instead of in terms of op:numeric-add?
>>
>> Lee
>>
>> -------- Original Message --------
>> Subject: SUM aggregate operator and non-numeric literals
>> Resent-Date: Thu, 23 Jun 2011 01:05:51 +0000
>> Resent-From: public-rdf-dawg-comments@w3.org
>> Date: Thu, 23 Jun 2011 13:05:10 +1200
>> From: Jeen Broekstra<jeen.broekstra@gmail.com>
>> To: public-rdf-dawg-comments@w3.org
>>
>>
>> Hi DAWG,
>>
>> The current definition of SUM (section 18.4) is as follows :
>>
>> ==begin quote==
>> Definition: Sum
>> numeric Sum(multiset M)
>>
>> The Sum set function is used by the SUM aggregate in the syntax.
>>
>> Sum(M) = Sum(ToList(Flatten(M))).
>>
>> Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S]>  1
>> Sum(S) = op:numeric-add(S1, 0) when card[S] = 1
>> Sum(S) = 0 when card[S] = 0
>>
>> In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2,
>> op:numeric-add(3, 0))).
>> ==end quote==
>>
>> Given that the definition of SUM is directly in terms of the
>> op:numeric-add XPath function, it follows that it can only be applied on
>> numeric literals. Therefore, any SUM that aggregates over a set of
>> values that contains a non-numeric type will result in a type error. Not
>> even an extension of the SPARQL operator table in section 17.3 will
>> help, as SUM is not defined in terms of those operators.
>>
>> In other words, if we have the following data:
>>
>> :a rdf:value "1" .
>> :a rdf:value "2"^^xsd:integer .
>> :b rdf:value "3"^^xsd:integer .
>>
>> And the following query:
>>
>> SELECT (SUM(?val) as ?value)
>> WHERE {
>>    ?a rdf:value ?val .
>> } GROUP BY ?a
>>
>> The result will be always a type error.
>>
>> I would argue that having the same extensibility mechanisms available
>> for SUM as we have for, for example, the + operator would be preferable.
>> That way, implementations wanting to offer a more forgiving version of
>> the SUM operator (one which silently ignores the non-numerics, for
>> example), could do so while staying spec-compliant.
>>
>>
>> Regards,
>>
>> Jeen
>>
>>
>>
>>
>
Received on Sunday, 26 June 2011 15:44:09 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:46 GMT