W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > April to June 2011

Re: SUM aggregate operator and non-numeric literals

From: Steve Harris <steve.harris@garlik.com>
Date: Mon, 27 Jun 2011 11:02:06 +0100
Cc: public-rdf-dawg@w3.org
Message-Id: <5C11512C-4D9E-4C91-B34A-E52ACBFB037B@garlik.com>
To: Andy Seaborne <andy.seaborne@epimorphics.com>
On 2011-06-26, at 16:43, Andy Seaborne wrote:

> (can we have one x.y.z. section per aggregate like the functions so each has it's own doc link or links to the defintion of each?  The defs don't have anchors).

Yes, good idea, I've added it to my todo list.

> There are 3 points where customization can be done:
> 1/ The expression to be SUM'ed
> 2/ The aggregate called
> 3/ Whether SUM uses + or op:numeric-add
> 
> 1 and 2 can already be done.
> 
> 
> For 1/:
> 
> The effect Jeen uses as an example can be achieved by ensuring the multiset passed to SUM are all numbers:
> 
>  SUM(xsd:integer(?val))

Right.

> assuming that
> 
> :a rdf:value "rst" is still to be an error for the group it's in.
> 
>  SUM(COALESCE(xsd:integer(?val),0))
> 
> would give the effect of skip any error.
> 
> Now, if the application does want an error, it is able to choose the expression.
> 
> Another way, which copes with a wider range of numbers but without over-promotion of types:
> 
> SUM(IF(isNumeric(?val), ?val , COALESCE(xsd:double(?val),0))
> 
> For 2/:
> 
> We do allow extension of aggregates via custom aggregates called by URI.  A different aggregation function operation would call a different operation.
> 
> For 3/:
> 
> 3/ A change to SUM so it uses "+"
> 
> Doesn't that constrain how "+" can be extended?  I presume it must be commutative and associative so order of the SUM aggregation does not matter.
> 
> There's nothing to stop "+" being extended so that xsd:dataTime + xsd:duration -> xsd:dateTime
> 
> http://www.w3.org/TR/xmlschema-2/#adding-durations-to-dateTimes
> 
> SUM would need to say
>  sum(durations and at most DT) -> DT
> The result type of SUM is changed.

Good point, though that's more a good example of why overloading is a bad idea, IMHO. :)

> See also string + string.

Quite.

OK, I feel I have enough information to start drafting a response to Jeen now, I'll circulate it later today.

- Steve

> An error in an aggregation function in SELECT does not cause a row to be skipped - it becomes an unbound because of the SELECT expression.  An error in FILTER eliminates the row.
> 
> On 26/06/11 09:00, Steve Harris wrote:
>> I think it was mostly just that at the time I wrote the def'n there was no (obvious, explicit) function for +, makes sense to change it to me.
> 
> True - there isn't because dispatch by operator table isn't a function.  It could be - but it currently isn't.
> 
> It does make the conditions for extending "+" a bit tricky though.
> What conditions are needed?
> 
>    Andy
> 
>> 
>> - Steve
>> 
>> On 2011-06-25, at 16:18, Lee Feigenbaum wrote:
>> 
>>> On the surface, Jeen's reasoning makes sense to me.
>>> 
>>> Steve, did we/you consider defining SUM instead of "+" instead of in terms of op:numeric-add?
>>> 
>>> Lee
>>> 
>>> -------- Original Message --------
>>> Subject: SUM aggregate operator and non-numeric literals
>>> Resent-Date: Thu, 23 Jun 2011 01:05:51 +0000
>>> Resent-From: public-rdf-dawg-comments@w3.org
>>> Date: Thu, 23 Jun 2011 13:05:10 +1200
>>> From: Jeen Broekstra<jeen.broekstra@gmail.com>
>>> To: public-rdf-dawg-comments@w3.org
>>> 
>>> 
>>> Hi DAWG,
>>> 
>>> The current definition of SUM (section 18.4) is as follows :
>>> 
>>> ==begin quote==
>>> Definition: Sum
>>> numeric Sum(multiset M)
>>> 
>>> The Sum set function is used by the SUM aggregate in the syntax.
>>> 
>>> Sum(M) = Sum(ToList(Flatten(M))).
>>> 
>>> Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S]>  1
>>> Sum(S) = op:numeric-add(S1, 0) when card[S] = 1
>>> Sum(S) = 0 when card[S] = 0
>>> 
>>> In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2,
>>> op:numeric-add(3, 0))).
>>> ==end quote==
>>> 
>>> Given that the definition of SUM is directly in terms of the
>>> op:numeric-add XPath function, it follows that it can only be applied on
>>> numeric literals. Therefore, any SUM that aggregates over a set of
>>> values that contains a non-numeric type will result in a type error. Not
>>> even an extension of the SPARQL operator table in section 17.3 will
>>> help, as SUM is not defined in terms of those operators.
>>> 
>>> In other words, if we have the following data:
>>> 
>>> :a rdf:value "1" .
>>> :a rdf:value "2"^^xsd:integer .
>>> :b rdf:value "3"^^xsd:integer .
>>> 
>>> And the following query:
>>> 
>>> SELECT (SUM(?val) as ?value)
>>> WHERE {
>>>   ?a rdf:value ?val .
>>> } GROUP BY ?a
>>> 
>>> The result will be always a type error.
>>> 
>>> I would argue that having the same extensibility mechanisms available
>>> for SUM as we have for, for example, the + operator would be preferable.
>>> That way, implementations wanting to offer a more forgiving version of
>>> the SUM operator (one which silently ignores the non-numerics, for
>>> example), could do so while staying spec-compliant.
>>> 
>>> 
>>> Regards,
>>> 
>>> Jeen
>>> 
>>> 
>>> 
>>> 
>> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Monday, 27 June 2011 10:02:37 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:46 GMT