Re: another aggregates test case... from Andy Seaborne on 2010-06-08 (public-rdf-dawg@w3.org from April to June 2010)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Tue, 08 Jun 2010 17:59:35 +0100
To: Steve Harris <steve.harris@garlik.com>
CC: Lee Feigenbaum <lee@thefigtrees.net>, Axel Polleres <axel.polleres@deri.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4C0E76F7.7000801@talis.com>

On 08/06/2010 5:05 PM, Steve Harris wrote:
> On 2010-06-08, at 16:47, Andy Seaborne wrote:
>>
>> On 08/06/2010 3:12 PM, Lee Feigenbaum wrote:
>>> On 6/8/2010 10:04 AM, Andy Seaborne wrote:
>>>> I don't see why it needs to be an error - with no aggregation GROUP BY
>>>> can be considered to be a a partial sort. Cardinality same as without
>>>> GROUP BY. This also happens to be a requirement in some apps - results
>>>> clustered by key but the same number of rows as without grouping.
>>>> Sorting can make it so, but sorting is potentially more expensive.
>>>
>>> This sounds like a pretty different model of aggregation then we have
>>> now. (Actually sounds similar to the model that was proposed on the
>>> comments list a few months ago.) If we went this way, why not do this
>>> all the time, and just repeat the values for the aggregate calculations?
>>>
>>> I prefer to keep the existing aggregate model.
>>>
>>> Lee
>>
>> I'm not happy with the error case when GROUP BY is used and no aggregate is explicitly mentioned.
>
> Well, the rule is something like you can only project expressions if they're exactly a variable, and match the GROUP BY expression. Otherwise it has to be an aggregate expression.

In Lee's example:

SELECT ?v1 ?v2 ?v3
{ ... }
GROUP BY ?v1 ?v2 ?v3

is exactly variables and match the GROUP BY expression isn't it?

[[ rq25.xml#aggregateExample
In aggregate queries and sub-queries only expressions which have been
used as GROUP BY expressions, or aggregated expressions (i.e.
expressions where all variables appear inside an aggregate) can be
projected. In order to project arbitrary expressionsthe SAMPLE
aggregate may be used.
]]

The example was

GROUP BY ?v1 ?v2 ?v3

which can expose ?v1 ?v2 ?v3 can't it?

The text seems to allow even this:

SELECT (?x+?y AS ?Z)
{ ... }
GROUP BY (?x+?y)

because it only mentions "expressions" which seems quite generous.

What about "SELECT (?y+?x AS ?Z)"?

I'd like to know if I'm reading the text correctly but I'm happy with 
the current text on projecting expressions from GROUP BY - I can see why 
restricting to variables may be preferred.


What's wrong with these queries:

# Group by ?s and project ?s with count.
SELECT ?s (Count(*) AS ?C)
{
    ?s ?p ?p
} GROUP BY ?s


# Group by ?s and ?p but only project ?s and the group(key ?s ?p) count
SELECT ?s (Count(*) AS ?C)
{
    ?s ?p ?p
} GROUP BY ?s ?p


# Group by ?s and ?p and project ?s and ?p for each group as
# as well as the count
SELECT ?s ?p (Count(*) AS ?C)
{
    ?s ?p ?p
} GROUP BY ?s ?p

# Group by ?s and ?p and project ?s and ?p for each group as
# as well as the count
# Not so much use but seems legal by the text and is well-definable.
SELECT ?s ?p (Count(*) AS ?C)
{
    ?s ?p ?p
} GROUP BY ?s ?p


which leads me to a fairly natural interpretation of

SELECT ?s ?p
{
    ?s ?p ?p
} GROUP BY ?s ?p

as "null aggregation"


>> Seems useful in developing queries and makes aggregation reasonably orthogonal to grouping.
>>
>> SELECT * means all the keys (i.e. variables in scope after grouping)
>
> That seems fairly rational/sensible, but a significant departure from the meaning of * in non-aggregated queries.

For me, it's the natural meaning of "*" in "SELECT *" is all visible 
variables.  That covers SPARQL 1.0, subqueries and grouping. It is also 
the same algorithm for a syntax check of scoping GROUP BY and SELECT as 
above but maybe I don't understand that properly.

	Andy

Received on Tuesday, 8 June 2010 18:05:34 UTC