Re: ungrouped variables used in projections - Further implications? from Axel Polleres on 2010-08-25 (public-rdf-dawg@w3.org from July to September 2010)

From: Axel Polleres <axel.polleres@deri.org>
Date: Wed, 25 Aug 2010 22:24:33 +0100
To: "Andy Seaborne" <andy.seaborne@epimorphics.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>, Lee Feigenbaum <lee@thefigtrees.net>, Steve Harris <steve.harris@garlik.com>
Message-Id: <2C82249A-51DC-4578-9B44-2B2435A064BF@deri.org>
> > Any opinions on this? This actually worries me about the current  "potentially bound" wording.
> 
> If we want a static analysis of the query, then regard ?Y as potentially
> bound.

We'd need to explain/define the exact reading of "regard as potentially bound". As it stands it is unclear. My example *could* be detected by static analysis, if the static analyser was able to detect *statically* unsatisfiable FILTER expressions, such as (?Y != ?Y) , so it is not clear why  ?Y should be regarded as as potentially bound in my example. I am fine with any formulation which has a clear definition, the current wording unfortunately does not.

My proposal for rewording was maybe too restrictive, but it was clearly checkable statically.
BTW, the current wording for "SELECT *" is equally ambiguous.

[...]

> >> Unnecessarily severe.
> >
> > Fair enough, if we can afford it. Though it seems that expressions in GROUP BY are strictly speaking not necessary, and seem to be replaceable quite easily, so I wouldn't consider this restriction severe.
> 
> It's severe because it's the corner case driving the main design.  And
> you were arguing for shorter syntax.

Yes, but your version leaves us with something very restricted, it seems.
you say you'd disallow agg08...

> agg08 uses an expression for GROUP BY. I am suggesting, as a
> simplification, that it does not put ?O1 and ?O2, not (?O1+?O2), as
> legal uses in an expression in the SELECT clause.

That would be a quite different query, wouldn't it? Can you show me what exactly your simplification means for the agg08 query?


Let me try to understand again what you propose:
- you want to allow only grouped variables being projected or used in project expressions
- you additionally want to allow grouping by expressions, but the grouped expressions are not reusable in the SELECT clause.
yes? If so, it seems our arguments run a bit past each other...
You seem to propagate a stronger restriction than me for GROUPing, but a weaker restriction than mine for variables allowed as names in project expressions?

Axel

On 25 Aug 2010, at 20:18, Andy Seaborne wrote:

> 
> 
> On 25/08/10 19:07, Axel Polleres wrote:
> >
> > On 25 Aug 2010, at 18:33, Andy Seaborne wrote:
> >
> >> On 25/08/10 18:15, Axel Polleres wrote:
> >>> Thanks for these very useful examples, Andy! (which I think brought me to another
> >>> imprecise formulation in the spec, I think)
> >>>
> >>> Questions for clarification, to make sure everybody is on the same page here:
> >>>
> >>> 1)
> >>>> SELECT *
> >>>> {
> >>>>       { SELECT ?x { ?x ?p ?o } GROUP BY ?x }
> >>>>       ?o<p>   123 .
> >>>> }
> >>>
> >>> Yup, we want to allow this, right?
> >>
> >> Yes
> >
> > ok.
> >
> >>
> >>>
> >>> 2)
> >>>>     SELECT (count(*) AS ?p) { ?s ?p ?o } GROUP BY ?s
> >>> ...
> >>>>     SELECT (SAMPLE(?p) AS ?p) { ?s ?p ?o } GROUP BY ?s
> >>>
> >>> This is seemingly (but strangely enough not quite?) in conflict with:
> >>> "The new variable is introduced using the keyword AS; it must not already be potentially
> >>> bound."
> >>>
> >>> I'd honestly prefer somehow to strenghten this restriction to:
> >>>
> >>> "The new variable is introduced using the keyword AS; it must not already occur in the WHERE clause."
> >>
> >> Disagree - the GROUP example puts the inner variable out of scope.
> >
> > I don't really understand? With what exactly do you disagree?
>  >
> > I think we both agree that the current wording doesn't
> > "The new variable is introduced using the keyword AS; it must not already be potentially bound."
> > apply to your example.
>  >
> > My proposal was to strengthen this restriction such that your examples would also be forbidden,
> > is it this you are disagreeing with or do you disagree that my rewording catches your example?
> 
> I disagree with your rewording and additional the restriction.
> 
> I see no reason that a name should not introduced (by AS) if it does not
> conflict with anything.  If the pattern does not expose a name, it
> should possible to use the name.  Aids composition - people do seem to
> create large queries by working on fragments.
> 
> SELECT (?s AS ?subject) (?t AS ?p)
> {
>      {SELECT DISTINCT ?s {?s ?p ?o}} # Hides  ?p ?o
>      ?s rdf:type ?t
> }
> 
> Just because something looks bad style is not a reason to ban it.
> 
> >> An inner SELECT/project would do much the same - it's not just GROUPing.
> >
> > Well, the strenghtened restriction would also forbid variables occurring in a nested
> > query in the WHERE clause.
> 
> Quite - and unnecessarily so.
> 
> >> Building queries by combining tested fragments is made much harder if
> >> there are whole-query rules that mean a fragment worked on its own
> >> breaks a larger query.
> >
> > We have this effect already with the current restriction and I don't see why it gets
> > more difficult by strengthening the restricion.
> 
> Do we?  Where?  Variables can be hidden by subqueries.
> 
> >>> Funny enough, note that the original "potentially bound" formulation is problematic/imprecise already
> >>> without aggregates:
> >>>
> >>>    SELECT (?X as ?Y) WHERE { ?S ?P ?X OPTIONAL { ?S ?P ?Y FILTER(?Y != ?Y) } }
> >>>
> >>> Obviously, there is no way that ?Y ever returns a binding by the FILTER expression...
> >>> so it is not "potentially bound" and that query would be syntactically ok, according to the definition.
> >>> I guess many will agree that checking static unsatisfiability of FILTER expressions would be a nightmare for parsers :-)
> >
> > Any opinions on this? This actually worries me about the current  "potentially bound" wording.
> 
> If we want a static analysis of the query, then regard ?Y as potentially
> bound.
> 
> (we have avoided dynamic analysis errors to date - it's very hard to
> send the error mid way through a query - HTTP does not like that so it
> would require running to completion before sending the HTTP return code
> and hence any results)
> 
> >>> 3)
> >>>> Personally, I'd be happy with forbidding the use variables of grouping
> >>>> expressions:
> >>>>
> >>>>     SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable
> >>>>     SELECT ?o WHERE { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable
> >>
> >>> Without expressing any strong opinion here: This rules out the new test case agg08, or, resp.,
> >>> turns it into a negativeSyntaxTest. I had assumed for the current version of agg08 that the
> >>> former would be allowed whereas the latter wouldn't. That's why I had "*or expressions*" in
> >>> my rewording proposal.
> >>
> >> It does - it's a trade off - testing whether an expression is the same
> >> as another is tricky.
> >>
> >>> I assume what Andy means here (and which I think holds) is that we could forbid expressions
> >>> in Grouping alltogether, since they can be always emulated by subqueries, i.e.
> >>
> >> Not what I mean.
> >>
> >> I am suggesting simplifying by not requiring an impl to spot when two
> >> expressions are the same.
> >
> > ... but you would still allow the same expressions, i.e agg08 would still be fine, yes?
> 
> No.
> 
> ----
> SELECT ((?O1 + ?O2) AS ?O12) (COUNT(?O1) AS ?C)
> WHERE { ?S :p ?O1; :q ?O2 } GROUP BY (?O1 + ?O2)
> ORDER BY ?O12
> -----
> 
> agg08 uses an expression for GROUP BY.  I am suggesting, as a
> simplification, that it does not put ?O1 and ?O2, not (?O1+?O2), as
> legal uses in an expression in the SELECT clause.
> 
> > Then, I don't really understand your rewording proposal:
> >
> > ============
> > [[
> > In aggregate queries and sub-queries, variables that appear in the query
> > pattern, but are not used to group the pattern, cannot be projected nor
> > used in expressions in SELECT clause nor used in the expression of a
> > HAVING clause of this query or sub-query unless they are part of an
> > aggregate.
> >
> > They may be used as alias names.
> >
> > In order to project arbitrary expressions the SAMPLE aggregate may be used.
> > ]]
> >
> > By saying "expressions" the use as alias names comes for free but it's
> > clearer to say so.
> > ============
> >
> > Can you explain what you mean by "alias names" exactly?
> 
> New variable names introduced with AS.
> 
> > You mean to capture the same as I said with *or expressions* in my rewording, or something more general? I think we'd need to explain that notion.
> >
> >>
> >> SELECT (1/(-?o+1) AS ?o1) ... GROUP BY (1/(1-?o))
> >>          ^^^^^^^^^^
> >
> > Aah, different, overread that, sorry.
> >
> >>
> >> Use of ?o in any expression in projection (or HAVING - it's the same
> >> thing) is forbidden.
> >>
> >>>      SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } GROUP BY (1/(1-?o))
> >>>
> >>> could be written without expression in the GROUP BY clause as:
> >>>
> >>>      SELECT ?o1 { SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } } GROUP BY ?o1 }
> >>>
> >>> So, why not just doing just that and forbidding expressions in GROUP BY in the grammar already?
> >>
> >> Unnecessarily severe.
> >
> > Fair enough, if we can afford it. Though it seems that expressions in GROUP BY are strictly speaking not necessary, and seem to be replaceable quite easily, so I wouldn't consider this restriction severe.
> 
> It's severe because it's the corner case driving the main design.  And
> you were arguing for shorter syntax.
> 
> ARQ actually works by introducing a hidden variable for aggregate so
> it's use in HAVING or SELECT clauses is just use of that variable and a
> single evaluation of the aggregates value for each group.
> 
> >> Doing that because minor issue of the expressions in SELECT are tricky
> >> seems to have the balance all wrong.
> >
> > You mean expressions in GROUP BY, yes?
> >
> >>
> >>> 4) BTW, what about
> >>>        SELECT * { ?s ?p ?o } GROUP BY ?s
> >>>    Just to make sure everybody is on the same page here: is this also forbidden?
> >>
> >> No - it's natural.
> >
> > What I meant to say is currently it would be... reading * as a shortcut of all variables occurring in the WHERE clause.... BTW, the current formulation
> > "The syntax SELECT * is an abbreviation that selects all of the variables that could be bound in a query."
> > has the same problem as the "potentially bound" formulation mentioned earlier
> > ... so we need to reformulate that anyways.
> 
> The section is from SPARQL 1.0.
> 
> "Potentially bound" is a static analysis (well, it is for ARQ) of the
> query based on use in BGPs, GRAPH (so not if used in a FILTER alone) and
> now the name introduction forms .
> 
> >> Define the scoping of a group as the key variables (not expressions used
> >> in GROUP BY) and it works out easily.
> >
> > We need to find the right wording, since the notation "key" is only explained back in the algebra,
> > introducing it for defining the restrictions on variables further up in the spec already might be difficult?
> 
> My suggestion is just to cover variables, not expressions, used in GROUP
> BY.  Simple enough?
> 
> >
> > Axel
> 
>         Andy
>
Received on Wednesday, 25 August 2010 21:42:29 UTC