- From: Steve Harris <steve.harris@garlik.com>
- Date: Wed, 9 Jun 2010 10:08:26 +0100
- To: Andy Seaborne <andy.seaborne@talis.com>
- Cc: Lee Feigenbaum <lee@thefigtrees.net>, Axel Polleres <axel.polleres@deri.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
On 2010-06-08, at 17:59, Andy Seaborne wrote: > On 08/06/2010 5:05 PM, Steve Harris wrote: >> On 2010-06-08, at 16:47, Andy Seaborne wrote: >>> >>> On 08/06/2010 3:12 PM, Lee Feigenbaum wrote: >>>> On 6/8/2010 10:04 AM, Andy Seaborne wrote: >>>>> I don't see why it needs to be an error - with no aggregation GROUP BY >>>>> can be considered to be a a partial sort. Cardinality same as without >>>>> GROUP BY. This also happens to be a requirement in some apps - results >>>>> clustered by key but the same number of rows as without grouping. >>>>> Sorting can make it so, but sorting is potentially more expensive. >>>> >>>> This sounds like a pretty different model of aggregation then we have >>>> now. (Actually sounds similar to the model that was proposed on the >>>> comments list a few months ago.) If we went this way, why not do this >>>> all the time, and just repeat the values for the aggregate calculations? >>>> >>>> I prefer to keep the existing aggregate model. >>>> >>>> Lee >>> >>> I'm not happy with the error case when GROUP BY is used and no aggregate is explicitly mentioned. >> >> Well, the rule is something like you can only project expressions if they're exactly a variable, and match the GROUP BY expression. Otherwise it has to be an aggregate expression. > > In Lee's example: > > SELECT ?v1 ?v2 ?v3 > { ... } > GROUP BY ?v1 ?v2 ?v3 > > is exactly variables and match the GROUP BY expression isn't it? Yes, true, it was a long day! > [[ rq25.xml#aggregateExample > In aggregate queries and sub-queries only expressions which have been > used as GROUP BY expressions, or aggregated expressions (i.e. > expressions where all variables appear inside an aggregate) can be > projected. In order to project arbitrary expressionsthe SAMPLE > aggregate may be used. > ]] > > The example was > > GROUP BY ?v1 ?v2 ?v3 > > which can expose ?v1 ?v2 ?v3 can't it? Yes, or any expression using a subset of those. > The text seems to allow even this: > > SELECT (?x+?y AS ?Z) > { ... } > GROUP BY (?x+?y) > > because it only mentions "expressions" which seems quite generous. It was meant to convey expressions in the project expressions only, so that needs some rewording. > What about "SELECT (?y+?x AS ?Z)"? > > I'd like to know if I'm reading the text correctly but I'm happy with the current text on projecting expressions from GROUP BY - I can see why restricting to variables may be preferred. Otherwise we end up with an implicit SAMPLE() which the group was not keen on. > What's wrong with these queries: > > # Group by ?s and project ?s with count. > SELECT ?s (Count(*) AS ?C) > { > ?s ?p ?p > } GROUP BY ?s Nothing, ?C is produced with an aggregate, so it's legitimate. > # Group by ?s and ?p but only project ?s and the group(key ?s ?p) count > SELECT ?s (Count(*) AS ?C) > { > ?s ?p ?p > } GROUP BY ?s ?p Again, that's legit. > # Group by ?s and ?p and project ?s and ?p for each group as > # as well as the count > SELECT ?s ?p (Count(*) AS ?C) > { > ?s ?p ?p > } GROUP BY ?s ?p Ditto. > # Group by ?s and ?p and project ?s and ?p for each group as > # as well as the count > # Not so much use but seems legal by the text and is well-definable. > SELECT ?s ?p (Count(*) AS ?C) > { > ?s ?p ?p > } GROUP BY ?s ?p There are uses for that, and it's legit. > which leads me to a fairly natural interpretation of > > SELECT ?s ?p > { > ?s ?p ?p > } GROUP BY ?s ?p > > as "null aggregation" I don't understand the term "null aggregation". >>> Seems useful in developing queries and makes aggregation reasonably orthogonal to grouping. >>> >>> SELECT * means all the keys (i.e. variables in scope after grouping) >> >> That seems fairly rational/sensible, but a significant departure from the meaning of * in non-aggregated queries. > > For me, it's the natural meaning of "*" in "SELECT *" is all visible variables. That covers SPARQL 1.0, subqueries and grouping. It is also the same algorithm for a syntax check of scoping GROUP BY and SELECT as above but maybe I don't understand that properly. Sure, I think it's rational, it's just that * is defined differently in SPARQL 1.0: "SELECT * is an abbreviation that selects all of the variables in a query". We can define it in a way where it has the same behaviour as it did in 1.0 though, I'm sure. - Steve -- Steve Harris, Garlik Limited 1-3 Halford Road, Richmond, TW10 6AW, UK +44 20 8439 8203 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 9 June 2010 09:08:59 UTC