W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2010

Re: ungrouped variables used in projections - Further implications?

From: Axel Polleres <axel.polleres@deri.org>
Date: Wed, 25 Aug 2010 18:15:28 +0100
Cc: Lee Feigenbaum <lee@thefigtrees.net>, Steve Harris <steve.harris@garlik.com>
Message-Id: <006762F4-71BE-4020-AAA5-7526040687A5@deri.org>
To: Andy Seaborne <andy.seaborne@epimorphics.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Thanks for these very useful examples, Andy! (which I think brought me to another 
imprecise formulation in the spec, I think)

Questions for clarification, to make sure everybody is on the same page here:

1) 
> SELECT *
> {
>     { SELECT ?x { ?x ?p ?o } GROUP BY ?x }
>     ?o <p> 123 .
> }

Yup, we want to allow this, right?

2)
>   SELECT (count(*) AS ?p) { ?s ?p ?o } GROUP BY ?s
...
>   SELECT (SAMPLE(?p) AS ?p) { ?s ?p ?o } GROUP BY ?s

This is seemingly (but strangely enough not quite?) in conflict with:
"The new variable is introduced using the keyword AS; it must not already be potentially 
bound." 

I'd honestly prefer somehow to strenghten this restriction to:

"The new variable is introduced using the keyword AS; it must not already occur in the WHERE clause." 

Funny enough, note that the original "potentially bound" formulation is problematic/imprecise already 
without aggregates:

 SELECT (?X as ?Y) WHERE { ?S ?P ?X OPTIONAL { ?S ?P ?Y FILTER(?Y != ?Y) } }

Obviously, there is no way that ?Y ever returns a binding by the FILTER expression... 
so it is not "potentially bound" and that query would be syntactically ok, according to the definition.
I guess many will agree that checking static unsatisfiability of FILTER expressions would be a nightmare for parsers :-)

3)
> Personally, I'd be happy with forbidding the use variables of grouping
> expressions:
> 
>   SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable
>   SELECT ?o WHERE { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable

Without expressing any strong opinion here: This rules out the new test case agg08, or, resp., 
turns it into a negativeSyntaxTest. I had assumed for the current version of agg08 that the 
former would be allowed whereas the latter wouldn't. That's why I had "*or expressions*" in 
my rewording proposal.

I assume what Andy means here (and which I think holds) is that we could forbid expressions 
in Grouping alltogether, since they can be always emulated by subqueries, i.e. 
   
   SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } GROUP BY (1/(1-?o))

could be written without expression in the GROUP BY clause as:

   SELECT ?o1 { SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } } GROUP BY ?o1 } 

So, why not just doing just that and forbidding expressions in GROUP BY in the grammar already?


4) BTW, what about
     SELECT * { ?s ?p ?o } GROUP BY ?s 
 Just to make sure everybody is on the same page here: is this also forbidden?

Thanks,
Axel



On 25 Aug 2010, at 16:37, Andy Seaborne wrote:

> 
> 
> On 25/08/10 13:33, Axel Polleres wrote:
> > In total, addressing 1) and 2) my current understanding is that we should change:
> >
> > "In aggregate queries and sub-queries variables that appear in the query
> > pattern, but are not grouped by cannot be projected nor used in project
> > expressions. In order to project arbitrary expressions the SAMPLE
> > aggregate may be used."
> >
> > -->
> >
> > "In aggregate queries and sub-queries variables *or expressions* that appear in the query
> > pattern, but are not grouped by cannot be projected, nor be used in project
> > expressions *(except within aggregations)*, *nor be used in HAVING clauses*.
> > In order to project arbitrary expressions the SAMPLE aggregate may be used."
> >
> > The formulation gets a bit heavier, but at least it seems clearer.
> 
> Refining this:
> 
> We need to forbid the *use* of ungrouped variables in the *specific
> SELECT* expression where the GROUP occurs.  Otherwise:
> 
> 1/ Use elsewhere in the query should be unaffected otherwise
> 
> SELECT *
> {
>     { SELECT ?x { ?x ?p ?o } GROUP BY ?x }
>     ?o <p> 123 .
> }
> 
> is illegal (it's a completely different ?o in the second use) which
> makes building queries by composition a nuisance.

> 
> 2/ It's the undefined value of a non-key variable that's the issue
> because there isn't a clear value to give it.
> 
> Introduction of an alias name is OK: this is being clear about the "use
> in expressions"(1/(1-?o) AS ?o1)
> 
>   SELECT (count(*) AS ?p) { ?s ?p ?o } GROUP BY ?s
> 
> in extremis:
> 
>   SELECT (SAMPLE(?p) AS ?p) { ?s ?p ?o } GROUP BY ?s
> 
> Personally, I'd be happy with forbidding the use variables of grouping
> expressions:
> 
>   SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable
>   SELECT ?o WHERE { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable
> 
> [[
> In aggregate queries and sub-queries, variables that appear in the query
> pattern, but are not used to group the pattern, cannot be projected nor
> used in expressions in SELECT clause nor used in the expression of a
> HAVING clause of this query or sub-query unless they are part of an
> aggregate.
> 
> They may be used as alias names.
> 
> In order to project arbitrary expressions the SAMPLE aggregate may be used.
> ]]
> 
> By saying "expressions" the use as alias names comes for free but it's
> clearer to say so.
> 
>         Andy
> 
Received on Wednesday, 25 August 2010 17:16:04 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:43 GMT