Re: another aggregates test case... from Steve Harris on 2010-06-08 (public-rdf-dawg@w3.org from April to June 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Tue, 8 Jun 2010 17:05:02 +0100
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, Axel Polleres <axel.polleres@deri.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <65300A7F-C0E4-43AE-9D34-CBE6CD929BE6@garlik.com>

On 2010-06-08, at 16:47, Andy Seaborne wrote:
> 
> On 08/06/2010 3:12 PM, Lee Feigenbaum wrote:
>> On 6/8/2010 10:04 AM, Andy Seaborne wrote:
>>> I don't see why it needs to be an error - with no aggregation GROUP BY
>>> can be considered to be a a partial sort. Cardinality same as without
>>> GROUP BY. This also happens to be a requirement in some apps - results
>>> clustered by key but the same number of rows as without grouping.
>>> Sorting can make it so, but sorting is potentially more expensive.
>> 
>> This sounds like a pretty different model of aggregation then we have
>> now. (Actually sounds similar to the model that was proposed on the
>> comments list a few months ago.) If we went this way, why not do this
>> all the time, and just repeat the values for the aggregate calculations?
>> 
>> I prefer to keep the existing aggregate model.
>> 
>> Lee
> 
> I'm not happy with the error case when GROUP BY is used and no aggregate is explicitly mentioned.

Well, the rule is something like you can only project expressions if they're exactly a variable, and match the GROUP BY expression. Otherwise it has to be an aggregate expression.

> To keep as close to the model as far as it is currently defined, I would be happy with the "null aggregation" case (reduced to a table of keys, no aggregate column added, keys are projectable).

Currently

SELECT COUNT(?o) AS ?c
WHERE {
  ?s ?p ?o .
}

Is effectively the same as

SELECT COUNT(?o) AS ?c
WHERE {
  ?s ?p ?o .
}
GROUP BY 1

i.e. it's one group of all solutions.

However Axels example used a GROUP BY expression, so the projection rules are explicit.

> Seems useful in developing queries and makes aggregation reasonably orthogonal to grouping.
> 
> SELECT * means all the keys (i.e. variables in scope after grouping)

That seems fairly rational/sensible, but a significant departure from the meaning of * in non-aggregated queries.

- Steve

-- 
Steve Harris, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Tuesday, 8 June 2010 16:05:31 UTC