Re: Prioritised list of open issues (query, my bits) from Steve Harris on 2010-02-09 (public-rdf-dawg@w3.org from January to March 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Tue, 9 Feb 2010 17:23:56 +0000
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Message-Id: <7507D9C1-DBA7-4180-975D-CD9EBEA4D876@garlik.com>

On 9 Feb 2010, at 13:53, Andy Seaborne wrote:
> On 09/02/2010 1:34 PM, Steve Harris wrote:
...
>> if M = your solution multiset above.
>> M' = M(fn:floor((?x+1)/2))) { 1, 1, 1 }
>> M'' = DISTINCT M' { 1 }
>> result = Count(M'') 1
>>
>> This is how aggregates are defined in SQL, and I can't think of any
>> pressing reason to depart from that.
>>
>>> which is applying the DISTINCT after the implicit projection (case  
>>> 1)
>>> and after expression evaluation (case 2).
>>>
>>> I thought Steve and I were mostly agreeing, except over whether one
>>> can name the DISTINCT and non-DISTINCT versions with URIs.
>>
>> That's the essence I think. But maybe you're proposing something  
>> different?
>
> An aggregate function takes the partition as a multiset and also  
> it's and its expressions.  It decides on DISTINCT or not - this is  
> moving the ExprMultiset machinery inside the aggregate.

I don't find that very tempting, it moves a lot of unnecessary  
complexity into the aggregate, rather than it being a simple set  
function.

> It has a very similar effect but allows the aggregate to decide how  
> to handle unbounds, expression errors etc (in SQL, it is the  
> aggregate function that decides although all the standard ones do  
> the same).

Not in ANSI SQL.

COUNT(*) is a special case.

Otherwise, you eliminate NULL values that result in the application of  
the expression to the table T, giving TX.
If DISTINCT is specified, then let TXA be the result of eliminating  
redundant duplicate values from TX. Otherwise, let TXA be TX.
Then you apply the "set function" (aggregate) to TXA.

There's a discussion of this at http://www.dbazine.com/ofinterest/oi-articles/celko14

> Different URIs name different aggregators so can have different  
> DISTINCT effects.
>
> I found this the most concise description:
> http://www.postgresql.org/docs/8.4/static/sql-expressions.html#SYNTAX-AGGREGATES

Note that Postgres does not require you to define separate aggregates  
for the DISTINCT and ALL behaviours either, even for user-defined  
aggregates.

The SQL spec in this area is unusually readable, and quite clear. The  
only obvious way that Postgres differs is that it allows user-defined  
aggregates to regard NULL values, but that syntax page doesn't really  
say very much.

- Steve

-- 
Steve Harris, Garlik Limited
2 Sheen Road, Richmond, TW9 1AE, UK
+44 20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  
9AD

Received on Tuesday, 9 February 2010 17:24:29 UTC