Semantics of aggregates from Seaborne, Andy on 2009-08-31 (public-rdf-dawg@w3.org from July to September 2009)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 31 Aug 2009 22:25:52 +0000
To: Steve Harris <steve.harris@garlik.com>, "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Message-ID: <B6CF1054FDC8B845BF93A6645D19BEA3693CEC252D@GVW1118EXC.americas.hpqcorp.net>

> -----Original Message-----
> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
> On Behalf Of Steve Harris
> Sent: 31 August 2009 22:34
> To: public-rdf-dawg@w3.org Group
> Subject: Re: Syntax for custom aggregates
> 
> On 31 Aug 2009, at 18:58, Seaborne, Andy wrote:
> 
> > In the grammar [1], I didn’t put in syntax for custom aggregates.
> > I'm assuming that the ability to be able to specify a URI for an
> > aggregate function is a useful extension point.
> >
> > An aggregate in SPARQL is a function that takes a set of query
> > solutions and produces one or more values query solutions which
> > include the group by variables and any aggregate variable/values.
> > It's "or more" for the case of MIN() returning an answer for the MIN
> > number, the MIN string, MIN dateTime - it would be one row for each
> > possibility for each group.
> 
> The "or more" thing concerns me. I remember the group discussing this,
> but I don't believe that we came to a consensus.

Sorry - I worded it too strongly.  There are alternative possibilities to be considered.

> 
> What would be the expected behaviour given
>    SELECT min(?x) min(?y) min(?z) { ... }
> where x, y, and z each take some subset of numbers, dates etc? Also
> unknown datatypes pose a problem.

I don't know (for several of the designs).

We have to have a design that copes with the situation of mixed sets of numbers although I hope it's more of a corner case to be dealt with rather than a driver for the overall design.  (FWIW I value consistency of results across implementations and also not having errors during query evaluation aborting the overall query.)

Unknown datatypes are certainly a problem as well.

 Andy

Received on Monday, 31 August 2009 22:26:56 UTC