W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2010

Re: ungrouped variables used in projections - Further implications?

From: Axel Polleres <axel.polleres@deri.org>
Date: Thu, 26 Aug 2010 15:46:03 +0100
Cc: "SPARQL Working Group" <public-rdf-dawg@w3.org>, "Lee Feigenbaum" <lee@thefigtrees.net>, "Steve Harris" <steve.harris@garlik.com>
Message-Id: <C8AAD659-79DA-4246-8376-01EBB12A3BAC@deri.org>
To: "Andy Seaborne" <andy.seaborne@epimorphics.com>

On 26 Aug 2010, at 14:48, Andy Seaborne wrote:

> On 25/08/10 22:24, Axel Polleres wrote:
> >>> Any opinions on this? This actually worries me about the current  "potentially bound" wording.
> >>
> >> If we want a static analysis of the query, then regard ?Y as potentially
> >> bound.
> >
> > We'd need to explain/define the exact reading of "regard as potentially bound". As it stands it is unclear. My example *could* be detected by static analysis, if the static analyser was able to detect *statically* unsatisfiable FILTER expressions, such as (?Y != ?Y) , so it is not clear why  ?Y should be regarded as as potentially bound in my example. I am fine with any formulation which has a clear definition, the current wording unfortunately does not.
> In general, analysing expressions for non-satisfiability is not
> practical.

we are in wild agreement on this! ...

>  Only some simple cases like you example are possible as are
> other forms optimizing compilers might notice. Once reordering and
> equivalence are added, the complexity cost grows.  And what about
> structural invariances like FILTER(?Y>45 && isIRI(?Y)) or
> data-introduced data-introduced FILTER(?ageInYear < -10)?
> A practical scheme is based on sites where variables are bound in BGP
> within patterns.
> This relates to the GROUP BY and expression handling.  Detecting when
> one expression (in the SELECT line) uses another (from the GROUP BY) is
> complex because complex expressions can be written in different ways yet
> be equivalent expressions.
> Therefore, I suggest we do not requite implementations to be able to
> perform such comparisons.

... and this!

> My outline definition of potentially bound is a practical algorithm
> based on just the points where a variable can be bound (not filtered
> out).  There are only a few places where terms can be bound to variables.

Ok, my main point was that we'd need to have this defined in the spec clearly, which currently isn't the case.

> > My proposal for rewording was maybe too restrictive, but it was clearly checkable statically.
> > BTW, the current wording for "SELECT *" is equally ambiguous.
> >
> > [...]
> >
> >>>> Unnecessarily severe.
> >>>
> >>> Fair enough, if we can afford it. Though it seems that expressions in GROUP BY are strictly speaking not necessary, and seem to be replaceable quite easily, so I wouldn't consider this restriction severe.
> >>
> >> It's severe because it's the corner case driving the main design.  And
> >> you were arguing for shorter syntax.
> >
> > Yes, but your version leaves us with something very restricted, it seems.
> > you say you'd disallow agg08...
> Why is it "very restricted"?

You seem to restrict more than I would.
> It's a restriction but I don't see it as /very/ restricting, especially
> as you have already shown that if the app needs the value of the
> grouping returned it can do so using a nested SELECT.
> The balance is the difficulty of determining whether one expression is a
> sub-expression of another, including reordering and rewriting.
> Consider
> GROUP BY (1/?o)
> then
> SELECT (fn:floor(1/(-2*?o))+count(*)))

Sure, but I had maent to allow only the *exact same* expression as the 
grouped expression as subexpression.

> is theoretically safe.  When two or more variables are involved, it gets
> complicated.
> >> agg08 uses an expression for GROUP BY. I am suggesting, as a
> >> simplification, that it does not put ?O1 and ?O2, not (?O1+?O2), as
> >> legal uses in an expression in the SELECT clause.
> >
> > That would be a quite different query, wouldn't it? Can you show me what exactly your simplification means for the agg08 query?
> agg08 would be an error because it uses variables in an expression which
> are not key variables of the group.

> > Let me try to understand again what you propose:
> > - you want to allow only grouped variables being projected or used in project expressions
> Yes, understanding "grouped variables" as variables used in GROUP BY,
> but not in an expression.
> > - you additionally want to allow grouping by expressions, but the grouped expressions are not reusable in the SELECT clause.
> > yes?
> Yes.  I'm following the current doc which allows grouping by expression
> (syntax and definition reading ExprList as a list of expressions).
> Group(ExprList, Ω) =
>    { ListEval(ExprList, μ) ->
>      { μ' | μ' in Ω, ListEval(ExprList, μ) = ListEval(ExprList, μ') }
>      | μ in Ω }
> > If so, it seems our arguments run a bit past each other...
> > You seem to propagate a stronger restriction than me for GROUPing, but a weaker restriction than mine for variables allowed as names in project expressions?
> I suggest a stronger restriction on the variables allowed in project
> expressions (and projections and HAVING) in that it only considers
> variables.  This is because of the complexity of determining whether one
> expression is "safe" given an expression used for GROUP BY.
> Otherwise we are trying to allow:
> SELECT (?o1+?o2 AS ?o3) ... GROUP BY (?o2+?o1)

> SELECT (1/(?o1+?o2) AS ?o3) ... GROUP BY (?o2+?o1)

Would've been allowed in my current understanding, but am not religious about it.

> unclear about:
> SELECT (fn:floor(2*?o1+2*?o2)) AS ?o3) ... GROUP BY (?o2+?o1)
not allowed in my understanding

> but not
> SELECT ?o1 ... GROUP (?o2+?o1)

clearly not allowed in my understanding (for obvious reasons... different ?o1 values can contribute to the same (?o2+?o1) values, actually that is what agg08 should demonstrate.

> I don't think that removing the possibility of GROUP BY with an
> expression would be particularly serious; however, there is no reason to
> forbid it (the issue is expressions in SELECT with constant value within
> a group, not the GROUP BY clause) and it is in the current draft.
> I'm not sure what you propose. You have mentioned no expressions in
> GROUP BY and also allowing reuse of the same expression used in the
> GROUP BY in the select expressions.


>  For the latter, I haven't seen what
> equivalence of expressions,

syntactical equivalence, but...

> We could be consistent with SELECT expressions and go so far as to require the AS is an expression is used.

... that sounds reasonable to me as well.


Received on Thursday, 26 August 2010 14:46:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:01:01 UTC