- From: Axel Polleres <axel.polleres@deri.org>
- Date: Thu, 26 Aug 2010 15:46:03 +0100
- To: "Andy Seaborne" <andy.seaborne@epimorphics.com>
- Cc: "SPARQL Working Group" <public-rdf-dawg@w3.org>, "Lee Feigenbaum" <lee@thefigtrees.net>, "Steve Harris" <steve.harris@garlik.com>
On 26 Aug 2010, at 14:48, Andy Seaborne wrote:
>
>
> On 25/08/10 22:24, Axel Polleres wrote:
> >>> Any opinions on this? This actually worries me about the current "potentially bound" wording.
> >>
> >> If we want a static analysis of the query, then regard ?Y as potentially
> >> bound.
> >
> > We'd need to explain/define the exact reading of "regard as potentially bound". As it stands it is unclear. My example *could* be detected by static analysis, if the static analyser was able to detect *statically* unsatisfiable FILTER expressions, such as (?Y != ?Y) , so it is not clear why ?Y should be regarded as as potentially bound in my example. I am fine with any formulation which has a clear definition, the current wording unfortunately does not.
>
> In general, analysing expressions for non-satisfiability is not
> practical.
we are in wild agreement on this! ...
> Only some simple cases like you example are possible as are
> other forms optimizing compilers might notice. Once reordering and
> equivalence are added, the complexity cost grows. And what about
> structural invariances like FILTER(?Y>45 && isIRI(?Y)) or
> data-introduced data-introduced FILTER(?ageInYear < -10)?
>
> A practical scheme is based on sites where variables are bound in BGP
> within patterns.
>
> This relates to the GROUP BY and expression handling. Detecting when
> one expression (in the SELECT line) uses another (from the GROUP BY) is
> complex because complex expressions can be written in different ways yet
> be equivalent expressions.
>
> Therefore, I suggest we do not requite implementations to be able to
> perform such comparisons.
... and this!
>
> My outline definition of potentially bound is a practical algorithm
> based on just the points where a variable can be bound (not filtered
> out). There are only a few places where terms can be bound to variables.
Ok, my main point was that we'd need to have this defined in the spec clearly, which currently isn't the case.
> > My proposal for rewording was maybe too restrictive, but it was clearly checkable statically.
> > BTW, the current wording for "SELECT *" is equally ambiguous.
> >
> > [...]
> >
> >>>> Unnecessarily severe.
> >>>
> >>> Fair enough, if we can afford it. Though it seems that expressions in GROUP BY are strictly speaking not necessary, and seem to be replaceable quite easily, so I wouldn't consider this restriction severe.
> >>
> >> It's severe because it's the corner case driving the main design. And
> >> you were arguing for shorter syntax.
> >
> > Yes, but your version leaves us with something very restricted, it seems.
> > you say you'd disallow agg08...
>
> Why is it "very restricted"?
You seem to restrict more than I would.
>
> It's a restriction but I don't see it as /very/ restricting, especially
> as you have already shown that if the app needs the value of the
> grouping returned it can do so using a nested SELECT.
>
> The balance is the difficulty of determining whether one expression is a
> sub-expression of another, including reordering and rewriting.
>
> Consider
>
> GROUP BY (1/?o)
>
> then
>
> SELECT (fn:floor(1/(-2*?o))+count(*)))
Sure, but I had maent to allow only the *exact same* expression as the
grouped expression as subexpression.
> is theoretically safe. When two or more variables are involved, it gets
> complicated.
>
> >> agg08 uses an expression for GROUP BY. I am suggesting, as a
> >> simplification, that it does not put ?O1 and ?O2, not (?O1+?O2), as
> >> legal uses in an expression in the SELECT clause.
> >
> > That would be a quite different query, wouldn't it? Can you show me what exactly your simplification means for the agg08 query?
>
> agg08 would be an error because it uses variables in an expression which
> are not key variables of the group.
> > Let me try to understand again what you propose:
> > - you want to allow only grouped variables being projected or used in project expressions
>
> Yes, understanding "grouped variables" as variables used in GROUP BY,
> but not in an expression.
>
> > - you additionally want to allow grouping by expressions, but the grouped expressions are not reusable in the SELECT clause.
> > yes?
>
> Yes. I'm following the current doc which allows grouping by expression
> (syntax and definition reading ExprList as a list of expressions).
>
> Group(ExprList, Ω) =
> { ListEval(ExprList, μ) ->
> { μ' | μ' in Ω, ListEval(ExprList, μ) = ListEval(ExprList, μ') }
> | μ in Ω }
>
> > If so, it seems our arguments run a bit past each other...
> > You seem to propagate a stronger restriction than me for GROUPing, but a weaker restriction than mine for variables allowed as names in project expressions?
>
> I suggest a stronger restriction on the variables allowed in project
> expressions (and projections and HAVING) in that it only considers
> variables. This is because of the complexity of determining whether one
> expression is "safe" given an expression used for GROUP BY.
>
> Otherwise we are trying to allow:
>
> SELECT (?o1+?o2 AS ?o3) ... GROUP BY (?o2+?o1)
> SELECT (1/(?o1+?o2) AS ?o3) ... GROUP BY (?o2+?o1)
>
Would've been allowed in my current understanding, but am not religious about it.
> unclear about:
> SELECT (fn:floor(2*?o1+2*?o2)) AS ?o3) ... GROUP BY (?o2+?o1)
>
not allowed in my understanding
> but not
> SELECT ?o1 ... GROUP (?o2+?o1)
clearly not allowed in my understanding (for obvious reasons... different ?o1 values can contribute to the same (?o2+?o1) values, actually that is what agg08 should demonstrate.
>
> I don't think that removing the possibility of GROUP BY with an
> expression would be particularly serious; however, there is no reason to
> forbid it (the issue is expressions in SELECT with constant value within
> a group, not the GROUP BY clause) and it is in the current draft.
>
> I'm not sure what you propose. You have mentioned no expressions in
> GROUP BY and also allowing reuse of the same expression used in the
> GROUP BY in the select expressions.
yes.
> For the latter, I haven't seen what
> equivalence of expressions,
syntactical equivalence, but...
> We could be consistent with SELECT expressions and go so far as to require the AS is an expression is used.
... that sounds reasonable to me as well.
Axel
>
Received on Thursday, 26 August 2010 14:46:37 UTC