W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2010

Re: ungrouped variables used in projections - Further implications?

From: Axel Polleres <axel.polleres@deri.org>
Date: Wed, 25 Aug 2010 19:07:48 +0100
Cc: "SPARQL Working Group" <public-rdf-dawg@w3.org>, "Lee Feigenbaum" <lee@thefigtrees.net>, "Steve Harris" <steve.harris@garlik.com>
Message-Id: <358468A1-798A-400F-BB26-1D30BD728E8A@deri.org>
To: "Andy Seaborne" <andy.seaborne@epimorphics.com>

On 25 Aug 2010, at 18:33, Andy Seaborne wrote:

> On 25/08/10 18:15, Axel Polleres wrote:
> > Thanks for these very useful examples, Andy! (which I think brought me to another
> > imprecise formulation in the spec, I think)
> >
> > Questions for clarification, to make sure everybody is on the same page here:
> >
> > 1)
> >> SELECT *
> >> {
> >>      { SELECT ?x { ?x ?p ?o } GROUP BY ?x }
> >>      ?o<p>  123 .
> >> }
> >
> > Yup, we want to allow this, right?
> 
> Yes

ok.

> 
> >
> > 2)
> >>    SELECT (count(*) AS ?p) { ?s ?p ?o } GROUP BY ?s
> > ...
> >>    SELECT (SAMPLE(?p) AS ?p) { ?s ?p ?o } GROUP BY ?s
> >
> > This is seemingly (but strangely enough not quite?) in conflict with:
> > "The new variable is introduced using the keyword AS; it must not already be potentially
> > bound."
> >
> > I'd honestly prefer somehow to strenghten this restriction to:
> >
> > "The new variable is introduced using the keyword AS; it must not already occur in the WHERE clause."
> 
> Disagree - the GROUP example puts the inner variable out of scope.

I don't really understand? With what exactly do you disagree?
I think we both agree that the current wording doesn't
"The new variable is introduced using the keyword AS; it must not already be potentially bound."
apply to your example.

My proposal was to strengthen this restriction such that your examples would also be forbidden, 
is it this you are disagreeing with or do you disagree that my rewording catches your example?
 
> An inner SELECT/project would do much the same - it's not just GROUPing.

Well, the strenghtened restriction would also forbid variables occurring in a nested 
query in the WHERE clause.

> Building queries by combining tested fragments is made much harder if
> there are whole-query rules that mean a fragment worked on its own
> breaks a larger query.

We have this effect already with the current restriction and I don't see why it gets 
more difficult by strengthening the restricion.

> 
> > Funny enough, note that the original "potentially bound" formulation is problematic/imprecise already
> > without aggregates:
> >
> >   SELECT (?X as ?Y) WHERE { ?S ?P ?X OPTIONAL { ?S ?P ?Y FILTER(?Y != ?Y) } }
> >
> > Obviously, there is no way that ?Y ever returns a binding by the FILTER expression...
> > so it is not "potentially bound" and that query would be syntactically ok, according to the definition.
> > I guess many will agree that checking static unsatisfiability of FILTER expressions would be a nightmare for parsers :-)

Any opinions on this? This actually worries me about the current  "potentially bound" wording.

> > 3)
> >> Personally, I'd be happy with forbidding the use variables of grouping
> >> expressions:
> >>
> >>    SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable
> >>    SELECT ?o WHERE { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable
> 
> > Without expressing any strong opinion here: This rules out the new test case agg08, or, resp.,
> > turns it into a negativeSyntaxTest. I had assumed for the current version of agg08 that the
> > former would be allowed whereas the latter wouldn't. That's why I had "*or expressions*" in
> > my rewording proposal.
> 
> It does - it's a trade off - testing whether an expression is the same
> as another is tricky.
> 
> > I assume what Andy means here (and which I think holds) is that we could forbid expressions
> > in Grouping alltogether, since they can be always emulated by subqueries, i.e.
> 
> Not what I mean.
> 
> I am suggesting simplifying by not requiring an impl to spot when two
> expressions are the same.

... but you would still allow the same expressions, i.e agg08 would still be fine, yes?
Then, I don't really understand your rewording proposal:

============
[[
In aggregate queries and sub-queries, variables that appear in the query
pattern, but are not used to group the pattern, cannot be projected nor
used in expressions in SELECT clause nor used in the expression of a
HAVING clause of this query or sub-query unless they are part of an
aggregate.

They may be used as alias names.

In order to project arbitrary expressions the SAMPLE aggregate may be used.
]]

By saying "expressions" the use as alias names comes for free but it's
clearer to say so.
============

Can you explain what you mean by "alias names" exactly? You mean to capture the same as I said with *or expressions* in my rewording, or something more general? I think we'd need to explain that notion.

> 
> SELECT (1/(-?o+1) AS ?o1) ... GROUP BY (1/(1-?o))
>         ^^^^^^^^^^

Aah, different, overread that, sorry.

> 
> Use of ?o in any expression in projection (or HAVING - it's the same
> thing) is forbidden.
> 
> >     SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } GROUP BY (1/(1-?o))
> >
> > could be written without expression in the GROUP BY clause as:
> >
> >     SELECT ?o1 { SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } } GROUP BY ?o1 }
> >
> > So, why not just doing just that and forbidding expressions in GROUP BY in the grammar already?
> 
> Unnecessarily severe.

Fair enough, if we can afford it. Though it seems that expressions in GROUP BY are strictly speaking not necessary, and seem to be replaceable quite easily, so I wouldn't consider this restriction severe.

> Doing that because minor issue of the expressions in SELECT are tricky
> seems to have the balance all wrong.

You mean expressions in GROUP BY, yes?

> 
> > 4) BTW, what about
> >       SELECT * { ?s ?p ?o } GROUP BY ?s
> >   Just to make sure everybody is on the same page here: is this also forbidden?
> 
> No - it's natural.

What I meant to say is currently it would be... reading * as a shortcut of all variables occurring in the WHERE clause.... BTW, the current formulation 
"The syntax SELECT * is an abbreviation that selects all of the variables that could be bound in a query."
has the same problem as the "potentially bound" formulation mentioned earlier
... so we need to reformulate that anyways.


> Define the scoping of a group as the key variables (not expressions used
> in GROUP BY) and it works out easily.

We need to find the right wording, since the notation "key" is only explained back in the algebra,
introducing it for defining the restrictions on variables further up in the spec already might be difficult?

Axel



> 
>         Andy
> 
> >
> > Thanks,
> > Axel
> 
Received on Wednesday, 25 August 2010 18:08:24 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:43 GMT