- From: Steve Harris <steve.harris@garlik.com>
- Date: Mon, 27 Jun 2011 13:20:20 +0100
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-dawg@w3.org
On 2011-06-27, at 12:38, Andy Seaborne wrote: > Looks fine - good point that SUM can be extended rather than requiring it's formulation to be changed. > > > One observation (does not the reply being sent): > > [[ > > The result will be always a type error. > > Correct. > ]] > > I'm not sure what Jeen means here - the query does not produce a type error. One of the two groups works, the other leads to an error in SUM but which caught in the SELECT expression and leads to an unbound variable. I thought he was referring to the result of the Sum() set function, but I can see it's open to interpretation. - Steve > On 27/06/11 11:36, Steve Harris wrote: >> I've added http://www.w3.org/2009/sparql/wiki/CommentResponse:JB-6 >> >> - Steve >> >> On 2011-06-27, at 11:02, Steve Harris wrote: >> >>> On 2011-06-26, at 16:43, Andy Seaborne wrote: >>> >>>> (can we have one x.y.z. section per aggregate like the functions so each has it's own doc link or links to the defintion of each? The defs don't have anchors). >>> >>> Yes, good idea, I've added it to my todo list. >>> >>>> There are 3 points where customization can be done: >>>> 1/ The expression to be SUM'ed >>>> 2/ The aggregate called >>>> 3/ Whether SUM uses + or op:numeric-add >>>> >>>> 1 and 2 can already be done. >>>> >>>> >>>> For 1/: >>>> >>>> The effect Jeen uses as an example can be achieved by ensuring the multiset passed to SUM are all numbers: >>>> >>>> SUM(xsd:integer(?val)) >>> >>> Right. >>> >>>> assuming that >>>> >>>> :a rdf:value "rst" is still to be an error for the group it's in. >>>> >>>> SUM(COALESCE(xsd:integer(?val),0)) >>>> >>>> would give the effect of skip any error. >>>> >>>> Now, if the application does want an error, it is able to choose the expression. >>>> >>>> Another way, which copes with a wider range of numbers but without over-promotion of types: >>>> >>>> SUM(IF(isNumeric(?val), ?val , COALESCE(xsd:double(?val),0)) >>>> >>>> For 2/: >>>> >>>> We do allow extension of aggregates via custom aggregates called by URI. A different aggregation function operation would call a different operation. >>>> >>>> For 3/: >>>> >>>> 3/ A change to SUM so it uses "+" >>>> >>>> Doesn't that constrain how "+" can be extended? I presume it must be commutative and associative so order of the SUM aggregation does not matter. >>>> >>>> There's nothing to stop "+" being extended so that xsd:dataTime + xsd:duration -> xsd:dateTime >>>> >>>> http://www.w3.org/TR/xmlschema-2/#adding-durations-to-dateTimes >>>> >>>> SUM would need to say >>>> sum(durations and at most DT) -> DT >>>> The result type of SUM is changed. >>> >>> Good point, though that's more a good example of why overloading is a bad idea, IMHO. :) >>> >>>> See also string + string. >>> >>> Quite. >>> >>> OK, I feel I have enough information to start drafting a response to Jeen now, I'll circulate it later today. >>> >>> - Steve >>> >>>> An error in an aggregation function in SELECT does not cause a row to be skipped - it becomes an unbound because of the SELECT expression. An error in FILTER eliminates the row. >>>> >>>> On 26/06/11 09:00, Steve Harris wrote: >>>>> I think it was mostly just that at the time I wrote the def'n there was no (obvious, explicit) function for +, makes sense to change it to me. >>>> >>>> True - there isn't because dispatch by operator table isn't a function. It could be - but it currently isn't. >>>> >>>> It does make the conditions for extending "+" a bit tricky though. >>>> What conditions are needed? >>>> >>>> Andy >>>> >>>>> >>>>> - Steve >>>>> >>>>> On 2011-06-25, at 16:18, Lee Feigenbaum wrote: >>>>> >>>>>> On the surface, Jeen's reasoning makes sense to me. >>>>>> >>>>>> Steve, did we/you consider defining SUM instead of "+" instead of in terms of op:numeric-add? >>>>>> >>>>>> Lee >>>>>> >>>>>> -------- Original Message -------- >>>>>> Subject: SUM aggregate operator and non-numeric literals >>>>>> Resent-Date: Thu, 23 Jun 2011 01:05:51 +0000 >>>>>> Resent-From: public-rdf-dawg-comments@w3.org >>>>>> Date: Thu, 23 Jun 2011 13:05:10 +1200 >>>>>> From: Jeen Broekstra<jeen.broekstra@gmail.com> >>>>>> To: public-rdf-dawg-comments@w3.org >>>>>> >>>>>> >>>>>> Hi DAWG, >>>>>> >>>>>> The current definition of SUM (section 18.4) is as follows : >>>>>> >>>>>> ==begin quote== >>>>>> Definition: Sum >>>>>> numeric Sum(multiset M) >>>>>> >>>>>> The Sum set function is used by the SUM aggregate in the syntax. >>>>>> >>>>>> Sum(M) = Sum(ToList(Flatten(M))). >>>>>> >>>>>> Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S]> 1 >>>>>> Sum(S) = op:numeric-add(S1, 0) when card[S] = 1 >>>>>> Sum(S) = 0 when card[S] = 0 >>>>>> >>>>>> In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2, >>>>>> op:numeric-add(3, 0))). >>>>>> ==end quote== >>>>>> >>>>>> Given that the definition of SUM is directly in terms of the >>>>>> op:numeric-add XPath function, it follows that it can only be applied on >>>>>> numeric literals. Therefore, any SUM that aggregates over a set of >>>>>> values that contains a non-numeric type will result in a type error. Not >>>>>> even an extension of the SPARQL operator table in section 17.3 will >>>>>> help, as SUM is not defined in terms of those operators. >>>>>> >>>>>> In other words, if we have the following data: >>>>>> >>>>>> :a rdf:value "1" . >>>>>> :a rdf:value "2"^^xsd:integer . >>>>>> :b rdf:value "3"^^xsd:integer . >>>>>> >>>>>> And the following query: >>>>>> >>>>>> SELECT (SUM(?val) as ?value) >>>>>> WHERE { >>>>>> ?a rdf:value ?val . >>>>>> } GROUP BY ?a >>>>>> >>>>>> The result will be always a type error. >>>>>> >>>>>> I would argue that having the same extensibility mechanisms available >>>>>> for SUM as we have for, for example, the + operator would be preferable. >>>>>> That way, implementations wanting to offer a more forgiving version of >>>>>> the SUM operator (one which silently ignores the non-numerics, for >>>>>> example), could do so while staying spec-compliant. >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Jeen >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> -- >>> Steve Harris, CTO, Garlik Limited >>> 1-3 Halford Road, Richmond, TW10 6AW, UK >>> +44 20 8439 8203 http://www.garlik.com/ >>> Registered in England and Wales 535 7233 VAT # 849 0517 11 >>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD >>> >>> >> > -- Steve Harris, CTO, Garlik Limited 1-3 Halford Road, Richmond, TW10 6AW, UK +44 20 8439 8203 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Monday, 27 June 2011 12:21:02 UTC