- From: Steve Harris <steve.harris@garlik.com>
- Date: Wed, 9 Jun 2010 10:08:26 +0100
- To: Andy Seaborne <andy.seaborne@talis.com>
- Cc: Lee Feigenbaum <lee@thefigtrees.net>, Axel Polleres <axel.polleres@deri.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
On 2010-06-08, at 17:59, Andy Seaborne wrote:
> On 08/06/2010 5:05 PM, Steve Harris wrote:
>> On 2010-06-08, at 16:47, Andy Seaborne wrote:
>>>
>>> On 08/06/2010 3:12 PM, Lee Feigenbaum wrote:
>>>> On 6/8/2010 10:04 AM, Andy Seaborne wrote:
>>>>> I don't see why it needs to be an error - with no aggregation GROUP BY
>>>>> can be considered to be a a partial sort. Cardinality same as without
>>>>> GROUP BY. This also happens to be a requirement in some apps - results
>>>>> clustered by key but the same number of rows as without grouping.
>>>>> Sorting can make it so, but sorting is potentially more expensive.
>>>>
>>>> This sounds like a pretty different model of aggregation then we have
>>>> now. (Actually sounds similar to the model that was proposed on the
>>>> comments list a few months ago.) If we went this way, why not do this
>>>> all the time, and just repeat the values for the aggregate calculations?
>>>>
>>>> I prefer to keep the existing aggregate model.
>>>>
>>>> Lee
>>>
>>> I'm not happy with the error case when GROUP BY is used and no aggregate is explicitly mentioned.
>>
>> Well, the rule is something like you can only project expressions if they're exactly a variable, and match the GROUP BY expression. Otherwise it has to be an aggregate expression.
>
> In Lee's example:
>
> SELECT ?v1 ?v2 ?v3
> { ... }
> GROUP BY ?v1 ?v2 ?v3
>
> is exactly variables and match the GROUP BY expression isn't it?
Yes, true, it was a long day!
> [[ rq25.xml#aggregateExample
> In aggregate queries and sub-queries only expressions which have been
> used as GROUP BY expressions, or aggregated expressions (i.e.
> expressions where all variables appear inside an aggregate) can be
> projected. In order to project arbitrary expressionsthe SAMPLE
> aggregate may be used.
> ]]
>
> The example was
>
> GROUP BY ?v1 ?v2 ?v3
>
> which can expose ?v1 ?v2 ?v3 can't it?
Yes, or any expression using a subset of those.
> The text seems to allow even this:
>
> SELECT (?x+?y AS ?Z)
> { ... }
> GROUP BY (?x+?y)
>
> because it only mentions "expressions" which seems quite generous.
It was meant to convey expressions in the project expressions only, so that needs some rewording.
> What about "SELECT (?y+?x AS ?Z)"?
>
> I'd like to know if I'm reading the text correctly but I'm happy with the current text on projecting expressions from GROUP BY - I can see why restricting to variables may be preferred.
Otherwise we end up with an implicit SAMPLE() which the group was not keen on.
> What's wrong with these queries:
>
> # Group by ?s and project ?s with count.
> SELECT ?s (Count(*) AS ?C)
> {
> ?s ?p ?p
> } GROUP BY ?s
Nothing, ?C is produced with an aggregate, so it's legitimate.
> # Group by ?s and ?p but only project ?s and the group(key ?s ?p) count
> SELECT ?s (Count(*) AS ?C)
> {
> ?s ?p ?p
> } GROUP BY ?s ?p
Again, that's legit.
> # Group by ?s and ?p and project ?s and ?p for each group as
> # as well as the count
> SELECT ?s ?p (Count(*) AS ?C)
> {
> ?s ?p ?p
> } GROUP BY ?s ?p
Ditto.
> # Group by ?s and ?p and project ?s and ?p for each group as
> # as well as the count
> # Not so much use but seems legal by the text and is well-definable.
> SELECT ?s ?p (Count(*) AS ?C)
> {
> ?s ?p ?p
> } GROUP BY ?s ?p
There are uses for that, and it's legit.
> which leads me to a fairly natural interpretation of
>
> SELECT ?s ?p
> {
> ?s ?p ?p
> } GROUP BY ?s ?p
>
> as "null aggregation"
I don't understand the term "null aggregation".
>>> Seems useful in developing queries and makes aggregation reasonably orthogonal to grouping.
>>>
>>> SELECT * means all the keys (i.e. variables in scope after grouping)
>>
>> That seems fairly rational/sensible, but a significant departure from the meaning of * in non-aggregated queries.
>
> For me, it's the natural meaning of "*" in "SELECT *" is all visible variables. That covers SPARQL 1.0, subqueries and grouping. It is also the same algorithm for a syntax check of scoping GROUP BY and SELECT as above but maybe I don't understand that properly.
Sure, I think it's rational, it's just that * is defined differently in SPARQL 1.0: "SELECT * is an abbreviation that selects all of the variables in a query".
We can define it in a way where it has the same behaviour as it did in 1.0 though, I'm sure.
- Steve
--
Steve Harris, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203 http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 9 June 2010 09:08:59 UTC