- From: Axel Polleres <axel.polleres@deri.org>
- Date: Wed, 25 Aug 2010 22:24:33 +0100
- To: "Andy Seaborne" <andy.seaborne@epimorphics.com>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>, Lee Feigenbaum <lee@thefigtrees.net>, Steve Harris <steve.harris@garlik.com>
> > Any opinions on this? This actually worries me about the current "potentially bound" wording. > > If we want a static analysis of the query, then regard ?Y as potentially > bound. We'd need to explain/define the exact reading of "regard as potentially bound". As it stands it is unclear. My example *could* be detected by static analysis, if the static analyser was able to detect *statically* unsatisfiable FILTER expressions, such as (?Y != ?Y) , so it is not clear why ?Y should be regarded as as potentially bound in my example. I am fine with any formulation which has a clear definition, the current wording unfortunately does not. My proposal for rewording was maybe too restrictive, but it was clearly checkable statically. BTW, the current wording for "SELECT *" is equally ambiguous. [...] > >> Unnecessarily severe. > > > > Fair enough, if we can afford it. Though it seems that expressions in GROUP BY are strictly speaking not necessary, and seem to be replaceable quite easily, so I wouldn't consider this restriction severe. > > It's severe because it's the corner case driving the main design. And > you were arguing for shorter syntax. Yes, but your version leaves us with something very restricted, it seems. you say you'd disallow agg08... > agg08 uses an expression for GROUP BY. I am suggesting, as a > simplification, that it does not put ?O1 and ?O2, not (?O1+?O2), as > legal uses in an expression in the SELECT clause. That would be a quite different query, wouldn't it? Can you show me what exactly your simplification means for the agg08 query? Let me try to understand again what you propose: - you want to allow only grouped variables being projected or used in project expressions - you additionally want to allow grouping by expressions, but the grouped expressions are not reusable in the SELECT clause. yes? If so, it seems our arguments run a bit past each other... You seem to propagate a stronger restriction than me for GROUPing, but a weaker restriction than mine for variables allowed as names in project expressions? Axel On 25 Aug 2010, at 20:18, Andy Seaborne wrote: > > > On 25/08/10 19:07, Axel Polleres wrote: > > > > On 25 Aug 2010, at 18:33, Andy Seaborne wrote: > > > >> On 25/08/10 18:15, Axel Polleres wrote: > >>> Thanks for these very useful examples, Andy! (which I think brought me to another > >>> imprecise formulation in the spec, I think) > >>> > >>> Questions for clarification, to make sure everybody is on the same page here: > >>> > >>> 1) > >>>> SELECT * > >>>> { > >>>> { SELECT ?x { ?x ?p ?o } GROUP BY ?x } > >>>> ?o<p> 123 . > >>>> } > >>> > >>> Yup, we want to allow this, right? > >> > >> Yes > > > > ok. > > > >> > >>> > >>> 2) > >>>> SELECT (count(*) AS ?p) { ?s ?p ?o } GROUP BY ?s > >>> ... > >>>> SELECT (SAMPLE(?p) AS ?p) { ?s ?p ?o } GROUP BY ?s > >>> > >>> This is seemingly (but strangely enough not quite?) in conflict with: > >>> "The new variable is introduced using the keyword AS; it must not already be potentially > >>> bound." > >>> > >>> I'd honestly prefer somehow to strenghten this restriction to: > >>> > >>> "The new variable is introduced using the keyword AS; it must not already occur in the WHERE clause." > >> > >> Disagree - the GROUP example puts the inner variable out of scope. > > > > I don't really understand? With what exactly do you disagree? > > > > I think we both agree that the current wording doesn't > > "The new variable is introduced using the keyword AS; it must not already be potentially bound." > > apply to your example. > > > > My proposal was to strengthen this restriction such that your examples would also be forbidden, > > is it this you are disagreeing with or do you disagree that my rewording catches your example? > > I disagree with your rewording and additional the restriction. > > I see no reason that a name should not introduced (by AS) if it does not > conflict with anything. If the pattern does not expose a name, it > should possible to use the name. Aids composition - people do seem to > create large queries by working on fragments. > > SELECT (?s AS ?subject) (?t AS ?p) > { > {SELECT DISTINCT ?s {?s ?p ?o}} # Hides ?p ?o > ?s rdf:type ?t > } > > Just because something looks bad style is not a reason to ban it. > > >> An inner SELECT/project would do much the same - it's not just GROUPing. > > > > Well, the strenghtened restriction would also forbid variables occurring in a nested > > query in the WHERE clause. > > Quite - and unnecessarily so. > > >> Building queries by combining tested fragments is made much harder if > >> there are whole-query rules that mean a fragment worked on its own > >> breaks a larger query. > > > > We have this effect already with the current restriction and I don't see why it gets > > more difficult by strengthening the restricion. > > Do we? Where? Variables can be hidden by subqueries. > > >>> Funny enough, note that the original "potentially bound" formulation is problematic/imprecise already > >>> without aggregates: > >>> > >>> SELECT (?X as ?Y) WHERE { ?S ?P ?X OPTIONAL { ?S ?P ?Y FILTER(?Y != ?Y) } } > >>> > >>> Obviously, there is no way that ?Y ever returns a binding by the FILTER expression... > >>> so it is not "potentially bound" and that query would be syntactically ok, according to the definition. > >>> I guess many will agree that checking static unsatisfiability of FILTER expressions would be a nightmare for parsers :-) > > > > Any opinions on this? This actually worries me about the current "potentially bound" wording. > > If we want a static analysis of the query, then regard ?Y as potentially > bound. > > (we have avoided dynamic analysis errors to date - it's very hard to > send the error mid way through a query - HTTP does not like that so it > would require running to completion before sending the HTTP return code > and hence any results) > > >>> 3) > >>>> Personally, I'd be happy with forbidding the use variables of grouping > >>>> expressions: > >>>> > >>>> SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable > >>>> SELECT ?o WHERE { ?s ?p ?o } GROUP BY (1/(1-?o)) # Forbiddable > >> > >>> Without expressing any strong opinion here: This rules out the new test case agg08, or, resp., > >>> turns it into a negativeSyntaxTest. I had assumed for the current version of agg08 that the > >>> former would be allowed whereas the latter wouldn't. That's why I had "*or expressions*" in > >>> my rewording proposal. > >> > >> It does - it's a trade off - testing whether an expression is the same > >> as another is tricky. > >> > >>> I assume what Andy means here (and which I think holds) is that we could forbid expressions > >>> in Grouping alltogether, since they can be always emulated by subqueries, i.e. > >> > >> Not what I mean. > >> > >> I am suggesting simplifying by not requiring an impl to spot when two > >> expressions are the same. > > > > ... but you would still allow the same expressions, i.e agg08 would still be fine, yes? > > No. > > ---- > SELECT ((?O1 + ?O2) AS ?O12) (COUNT(?O1) AS ?C) > WHERE { ?S :p ?O1; :q ?O2 } GROUP BY (?O1 + ?O2) > ORDER BY ?O12 > ----- > > agg08 uses an expression for GROUP BY. I am suggesting, as a > simplification, that it does not put ?O1 and ?O2, not (?O1+?O2), as > legal uses in an expression in the SELECT clause. > > > Then, I don't really understand your rewording proposal: > > > > ============ > > [[ > > In aggregate queries and sub-queries, variables that appear in the query > > pattern, but are not used to group the pattern, cannot be projected nor > > used in expressions in SELECT clause nor used in the expression of a > > HAVING clause of this query or sub-query unless they are part of an > > aggregate. > > > > They may be used as alias names. > > > > In order to project arbitrary expressions the SAMPLE aggregate may be used. > > ]] > > > > By saying "expressions" the use as alias names comes for free but it's > > clearer to say so. > > ============ > > > > Can you explain what you mean by "alias names" exactly? > > New variable names introduced with AS. > > > You mean to capture the same as I said with *or expressions* in my rewording, or something more general? I think we'd need to explain that notion. > > > >> > >> SELECT (1/(-?o+1) AS ?o1) ... GROUP BY (1/(1-?o)) > >> ^^^^^^^^^^ > > > > Aah, different, overread that, sorry. > > > >> > >> Use of ?o in any expression in projection (or HAVING - it's the same > >> thing) is forbidden. > >> > >>> SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } GROUP BY (1/(1-?o)) > >>> > >>> could be written without expression in the GROUP BY clause as: > >>> > >>> SELECT ?o1 { SELECT (1/(1-?o) AS ?o1) { ?s ?p ?o } } GROUP BY ?o1 } > >>> > >>> So, why not just doing just that and forbidding expressions in GROUP BY in the grammar already? > >> > >> Unnecessarily severe. > > > > Fair enough, if we can afford it. Though it seems that expressions in GROUP BY are strictly speaking not necessary, and seem to be replaceable quite easily, so I wouldn't consider this restriction severe. > > It's severe because it's the corner case driving the main design. And > you were arguing for shorter syntax. > > ARQ actually works by introducing a hidden variable for aggregate so > it's use in HAVING or SELECT clauses is just use of that variable and a > single evaluation of the aggregates value for each group. > > >> Doing that because minor issue of the expressions in SELECT are tricky > >> seems to have the balance all wrong. > > > > You mean expressions in GROUP BY, yes? > > > >> > >>> 4) BTW, what about > >>> SELECT * { ?s ?p ?o } GROUP BY ?s > >>> Just to make sure everybody is on the same page here: is this also forbidden? > >> > >> No - it's natural. > > > > What I meant to say is currently it would be... reading * as a shortcut of all variables occurring in the WHERE clause.... BTW, the current formulation > > "The syntax SELECT * is an abbreviation that selects all of the variables that could be bound in a query." > > has the same problem as the "potentially bound" formulation mentioned earlier > > ... so we need to reformulate that anyways. > > The section is from SPARQL 1.0. > > "Potentially bound" is a static analysis (well, it is for ARQ) of the > query based on use in BGPs, GRAPH (so not if used in a FILTER alone) and > now the name introduction forms . > > >> Define the scoping of a group as the key variables (not expressions used > >> in GROUP BY) and it works out easily. > > > > We need to find the right wording, since the notation "key" is only explained back in the algebra, > > introducing it for defining the restrictions on variables further up in the spec already might be difficult? > > My suggestion is just to cover variables, not expressions, used in GROUP > BY. Simple enough? > > > > > Axel > > Andy >
Received on Wednesday, 25 August 2010 21:42:29 UTC