SUM aggregate operator and non-numeric literals

Hi DAWG,

The current definition of SUM (section 18.4) is as follows :

==begin quote==
Definition: Sum
numeric Sum(multiset M)

The Sum set function is used by the SUM aggregate in the syntax.

Sum(M) = Sum(ToList(Flatten(M))).

Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S] > 1
Sum(S) = op:numeric-add(S1, 0) when card[S] = 1
Sum(S) = 0 when card[S] = 0

In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2, 
op:numeric-add(3, 0))).
==end quote==

Given that the definition of SUM is directly in terms of the 
op:numeric-add XPath function, it follows that it can only be applied on 
numeric literals. Therefore, any SUM that aggregates over a set of 
values that contains a non-numeric type will result in a type error. Not 
even an extension of the SPARQL operator table in section 17.3 will 
help, as SUM is not defined in terms of those operators.

In other words, if we have the following data:

:a rdf:value "1" .
:a rdf:value "2"^^xsd:integer .
:b rdf:value "3"^^xsd:integer .

And the following query:

SELECT (SUM(?val) as ?value)
WHERE {
    ?a rdf:value ?val .
} GROUP BY ?a

The result will be always a type error.

I would argue that having the same extensibility mechanisms available 
for SUM as we have for, for example, the + operator would be preferable. 
That way, implementations wanting to offer a more forgiving version of 
the SUM operator (one which silently ignores the non-numerics, for 
example), could do so while staying spec-compliant.


Regards,

Jeen

Received on Thursday, 23 June 2011 01:05:50 UTC