- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Mon, 25 May 2009 20:24:59 -0400
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
This email discharges my action http://www.w3.org/2009/sparql/track/actions/23 in reference to this issue: http://www.w3.org/2009/sparql/track/issues/11. When we surveyed existing approaches to aggregates at the F2F[1], the majority of implementations required that groups be explicitly defined. These implementations behave as follows: 1) If there is a GROUP BY clause, the solution set is split into disjoint partitions (groups) based on distinct combinations of values of the GROUP BY variables. 2) If there is no GROUP BY clause, the entire solution set functions as a single group. 3) In the projection (SELECT clause), you may project anything that has a well-defined value for each group. This means the following: a) A scalar expression involving only constants and variable mentioned in the GROUP BY clause (the simplest case of this is projecting a group by variable itself) b) An aggregate expression 4) In the projection, it is an error to project out anything else, in particular a scalar expression involving any variable not explicitly listed in the GROUP BY clause. It was mentioned that Virtuoso has implicit grouping. From the Virtuoso documentation[2], this design means: 1) There is no GROUP BY clause. Groups are always determined implicitly. 2) The grouping variables are determined by looking at the projected expressions. All variables mentioned in the projection (SELECT clause) that are _not_ part of an aggregate expression are considered as part of the grouping key. 3) If all projected expressions are aggregates, then the entire solution set functions as a single group. 3) Because the projection implicitly determines the groups, there are no error conditions. I believe that standard SQL is similar to the explicit case described above. Mysql documentation[3] states that MySQL extends this behavior so that projected variables that are not part of GROUP BY are _not_ an error, but instead behave similar to the SAMPLE aggregate[4] that we've briefly discussed in the past. Based on the balance of current implementation experience and based on the SQL precedent, I'm going to suggest that we resolve ISSUE-11 in favor of explicit grouping and in favor of it being an error to project variables (or scalar functions on variables) not mentioned in GROUP BY. Of course, I'll be happy to entertain suggestions to the contrary. Lee PS This is, again, how I'd like to see us proceed on issues - summary / proposal discussion / telecon discussion only if necessary / announced proposed resolution / telecon decision. During that process, proposals can and should also be worked out on the wiki, of course. Since we're dealing with UPDATE on this week's teleconference, the earliest we'd consider resolving this issue would be the following week - the agenda will list any proposed resolutions that the chairs intend to put to the group on a teleconference. [1] http://www.w3.org/2009/sparql/meeting/2009-05-07#aggregates [2] http://docs.openlinksw.com/virtuoso/rdfsparqlaggregate.html [3] http://dev.mysql.com/doc/refman/6.0/en/group-by-hidden-columns.html [4] http://www.w3.org/2009/sparql/wiki/Feature:SampleAggregate
Received on Tuesday, 26 May 2009 00:25:56 UTC