- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 26 Aug 2010 14:48:36 +0100
- To: Axel Polleres <axel.polleres@deri.org>
- CC: SPARQL Working Group <public-rdf-dawg@w3.org>, Lee Feigenbaum <lee@thefigtrees.net>, Steve Harris <steve.harris@garlik.com>
On 25/08/10 22:24, Axel Polleres wrote: >>> Any opinions on this? This actually worries me about the current "potentially bound" wording. >> >> If we want a static analysis of the query, then regard ?Y as potentially >> bound. > > We'd need to explain/define the exact reading of "regard as potentially bound". As it stands it is unclear. My example *could* be detected by static analysis, if the static analyser was able to detect *statically* unsatisfiable FILTER expressions, such as (?Y != ?Y) , so it is not clear why ?Y should be regarded as as potentially bound in my example. I am fine with any formulation which has a clear definition, the current wording unfortunately does not. In general, analysing expressions for non-satisfiability is not practical. Only some simple cases like you example are possible as are other forms optimizing compilers might notice. Once reordering and equivalence are added, the complexity cost grows. And what about structural invariances like FILTER(?Y>45 && isIRI(?Y)) or data-introduced data-introduced FILTER(?ageInYear < -10)? A practical scheme is based on sites where variables are bound in BGP within patterns. This relates to the GROUP BY and expression handling. Detecting when one expression (in the SELECT line) uses another (from the GROUP BY) is complex because complex expressions can be written in different ways yet be equivalent expressions. Therefore, I suggest we do not requite implementations to be able to perform such comparisons. My outline definition of potentially bound is a practical algorithm based on just the points where a variable can be bound (not filtered out). There are only a few places where terms can be bound to variables. > My proposal for rewording was maybe too restrictive, but it was clearly checkable statically. > BTW, the current wording for "SELECT *" is equally ambiguous. > > [...] > >>>> Unnecessarily severe. >>> >>> Fair enough, if we can afford it. Though it seems that expressions in GROUP BY are strictly speaking not necessary, and seem to be replaceable quite easily, so I wouldn't consider this restriction severe. >> >> It's severe because it's the corner case driving the main design. And >> you were arguing for shorter syntax. > > Yes, but your version leaves us with something very restricted, it seems. > you say you'd disallow agg08... Why is it "very restricted"? It's a restriction but I don't see it as /very/ restricting, especially as you have already shown that if the app needs the value of the grouping returned it can do so using a nested SELECT. The balance is the difficulty of determining whether one expression is a sub-expression of another, including reordering and rewriting. Consider GROUP BY (1/?o) then SELECT (fn:floor(1/(-2*?o))+count(*))) is theoretically safe. When two or more variables are involved, it gets complicated. >> agg08 uses an expression for GROUP BY. I am suggesting, as a >> simplification, that it does not put ?O1 and ?O2, not (?O1+?O2), as >> legal uses in an expression in the SELECT clause. > > That would be a quite different query, wouldn't it? Can you show me what exactly your simplification means for the agg08 query? agg08 would be an error because it uses variables in an expression which are not key variables of the group. > Let me try to understand again what you propose: > - you want to allow only grouped variables being projected or used in project expressions Yes, understanding "grouped variables" as variables used in GROUP BY, but not in an expression. > - you additionally want to allow grouping by expressions, but the grouped expressions are not reusable in the SELECT clause. > yes? Yes. I'm following the current doc which allows grouping by expression (syntax and definition reading ExprList as a list of expressions). Group(ExprList, Ω) = { ListEval(ExprList, μ) -> { μ' | μ' in Ω, ListEval(ExprList, μ) = ListEval(ExprList, μ') } | μ in Ω } > If so, it seems our arguments run a bit past each other... > You seem to propagate a stronger restriction than me for GROUPing, but a weaker restriction than mine for variables allowed as names in project expressions? I suggest a stronger restriction on the variables allowed in project expressions (and projections and HAVING) in that it only considers variables. This is because of the complexity of determining whether one expression is "safe" given an expression used for GROUP BY. Otherwise we are trying to allow: SELECT (?o1+?o2 AS ?o3) ... GROUP BY (?o2+?o1) SELECT (1/(?o1+?o2) AS ?o3) ... GROUP BY (?o2+?o1) unclear about: SELECT (fn:floor(2*?o1+2*?o2)) AS ?o3) ... GROUP BY (?o2+?o1) but not SELECT ?o1 ... GROUP (?o2+?o1) I don't think that removing the possibility of GROUP BY with an expression would be particularly serious; however, there is no reason to forbid it (the issue is expressions in SELECT with constant value within a group, not the GROUP BY clause) and it is in the current draft. I'm not sure what you propose. You have mentioned no expressions in GROUP BY and also allowing reuse of the same expression used in the GROUP BY in the select expressions. For the latter, I haven't seen what equivalence of expressions, or inclusion of expressions, is involved. Andy
Received on Thursday, 26 August 2010 13:49:23 UTC