ungrouped variables used in projections

We couldn't really find consensus on the issue of ungrouped variables used in projections in aggregate queries in today's call. I volunteered to summarise my currnet understanding of the different positions:

The issue is exemplified by the following query:

SELECT ?N COUNT(?P1) WHERE { ?P name ?N; knows ?P1 } GROUP BY ?P

1) The current spec seems to be clear about this case...

"In aggregate queries and sub-queries only expressions which have been used as GROUP BY 
 expressions, or aggregated expressions (i.e. expressions where all variables appear 
 inside an aggregate) can be projected."

... suggesting that it is an error.

An alternative handling would be to 
2) treat the non-grouped variables as unbound (I think that's what Andy suggested)
3) or leave the behavior to the implementation (I think that would be least favorable, increasing 
   ambiguity of the language and allowing to do anything)


An argument raised against 1) in favor of 2) was that we'd raise an error on an - otherwise syntactically correct - query, which might be considered awkward, and hard to implement for parsing, essentially needing to respect the context for parsing.

Note that we have a similar behaviour (needing a context-aware parser) already in forbidding bnodes being shared among patterns:
"When using blank nodes of the form _:abc,  labels for blank nodes are scoped to the basic graph pattern.  A label can be used in only a single basic graph pattern in any query."

If I understood correctly, Andy was arguing that checking reuse of bnodes was easier since the 
scope doesn't play a role, as apposed to GROUP BY. (More detailed explanation here appreciated.)

We had a strawpoll which ended as follows:

  Should ungrouped variabled in project expressions generate an error?
  +1: 6 0: 6 -1: 0

no objections, but when I asked whether among the supporters anyone would object against NOT flagging an error, Souri said he'd probably object.

Summarising, that lets me lean towards forbidding projection, unless we get new information. 

As a side remark, note that the current wording is not precise:

"In aggregate queries and sub-queries only expressions which have been used as GROUP BY 
 expressions, or aggregated expressions (i.e. expressions where all variables appear 
 inside an aggregate) can be projected."

Note that this does not cover the following case:
 
 SELECT (?N AS ?New) COUNT(?P1) WHERE { ?P name ?N; knows ?P1 } GROUP BY ?P

Thus, in case we stick with the general understanding of 1) I would suggest to reword:

"In aggregate queries and sub-queries variables that appear in the query pattern, but are not grouped by 
 cannot be projected nor used in project expressions."

In case we adopt 2) we should probably still say something about this case, maybe illustrate it with an example:

"In aggregate queries and sub-queries variables that appear in the query pattern, but are not grouped by 
 are unbound outside the query pattern. For instance, (add an example)"


Opinions welcome!

best,
Axel


 

Received on Tuesday, 24 August 2010 17:09:46 UTC