Re: More minimalistic proposal than potentially bound (http://www.w3.org/2009/sparql/track/actions/329)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 01 Nov 2010 14:40:44 +0000
Message-ID: <4CCED16C.10000@epimorphics.com>
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
CC: Gregory Williams <greg@evilfunhouse.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
```

On 27/10/10 14:10, Birte Glimm wrote:
> On 26 October 2010 18:51, Gregory Williams<greg@evilfunhouse.com>  wrote:
>> On Oct 26, 2010, at 12:55 PM, Birte Glimm wrote:
>>
>>> Thus, here's my attempt of defining scope, so that I can use the
>>> definition for SELCT * as above:
>>>
>>> Let Q be a query with group graph pattern G. The variables in scope
>>> for Q, written scope(Q) are the variables in scope of G, written
>>> scope(G). The variables in scope of a group graph pattern are
>>> recursively defined as follows:
>>> A group graph pattern G can either be a sub-select query (SubSelect
>>> from the grammar) or a sub-group graph pattern (GroupGraphPatternSub
>>> from the grammar). If G is a sub-group graph pattern, then G can
>>> contain blocks of triples in addition to a non-triple graph pattern
>>> (GraphPatternNotTriples in the grammar). Variables occurring in the
>>> blocks of triples are in scope of G. If G contains a non-triple graph
>>> pattern composed of one or more group graph patterns G1 to Gn, e.g.,
>>> OPTIONAL G1 or G1 UNION G2, then variables in scope(Gi) for each 1<=
>>> i<= n are in scope for G. If G is a sub-select query with group graph
>>> pattern G', then scope(G) = scope(G').

The variables in-scope for a subquery are those of the projection part
of the SELECT clause, if the sub-query isn't SELECT *.

>>
>> This leaves out several ways that variables can be bound (and in scope) besides triple patterns including GRAPH ?v {}, and BIND(expr AS ?v). Also, I'm a bit unclear on the wording here, as group graph patterns can't "be" sub-select queries or "sub-group" graph patterns. GGPs can contain subqueries, or other GGPs, but they can also contain BGPs and all the other graph patterns.
>
> Well, he wording mainly follows the SPARQL grammar, which
> distinguished between GGP as a single sub-select query and the other
> forms of GGP. Other forms can be composed of other GGPs, e.g., in the
> case of UNION.
>
> I agree that variables for some cases are still missing, as ?v in your
> GRAPH ?v {} example.

MINUS is special as the RHS does not count, presumably.  Likewise,
FILTER EXISTS.

>> I find the "potentially bound" definition much easier to understand and that definition maps nicely onto my mental model of the different graph patterns and operators in an AST. I personally think there's value in allowing SELECT * to be used in (non-trivial) grouped aggregate queries, and find it clear how the concept of "potentially bound" would encompass both the subselect and aggregate grouping cases we've discussed that affect the list of projectable, in-scope variables.
>
> My main objection to potentially bound is that I think that it is the
> same as variables that are in scope. The term in scope is used in

Yes - "in-scope variables" is "potentially bound".

I prefer talking about in-scope and didn't
change the language when "potentially bound" emerged.  My fault for
causing confusion.

I think it's used once, in sec 8.1.1 - there is also text talking about
filter scope and bNode label scope which are different.

> just the definition of what that means is
> still missing and I wouldn't want to introduce another term in
> parallel to in scope, which essentially denotes exactly the same.
> Either there is a difference between in scope and potentially bound
> (which I don't see yet) or we should decide for one term to use
> throughout. I prefer in scope and that's used already in several other
> places.
> Once we have a definition for that, which could then also be closer to
> the one proposed for potentially bound, we can just say that the * in
> SELECT * is an abbreviation for a list of all variables that are in
> scope.

The places where a variable can be set in a solution are, for SPARQL 1.1:
+ BGPs
+ GRAPH ?g
+ AS ?var : SELECT expressions, and GROUP BY (expr AS ?var)
+ BIND
+ BINDINGS

but restricted by:
- Not MINUS RHS
- Not FILTER EXISTS pattern
- Used in a group but not used as a variable (only) in a group key
- Projection in sub-queries.
That is SELECT ?x { ?a ?b ?c } does not make ?x in variable-scope.

The wiki definition wiki/Potentially_bound mostly works and we can
define in either syntax space or algebra space during conversion.

There is confusions syntax and algebra in FILTER
P = SERVICE t { P1 } does not bind t if t is a variable.

"Being bound" is an execution concept, more suited to the algebra, "in
scope" is more static scoping concept more suited to the syntax.

We have some syntax restrictions (fresh AS variables, no projecting
ungrouped variables) so I was planning to defining the concept in syntax
space.  To me, scoping is the better concept, except it's used in
relation to filters and bNodes (the latter concept being outside our
control).  I propose "variable scope" as the terminology.

Andy
Andy

PS

From writing an execution engine, even for SPARQL 1.0, I found it
useful to classify variables into 3 categories:

1/ Ones that are always set in a binding - whatever route through query
execution happens, the variable will be set. Write "Fixed(P)".

2/ Ones that may be set in a binding = there is some route through the
query execution that can lead that variables. Write "Maybe(P)"

3/ Ones that are read, but not necessarily set (used in filters = they
may not be used in a place where they get set, and now HAVING; also
projection of unmentioned variables). When calculating, not necessarily
disjoint with 1 and 2. Ignore (3) as it does not affect SELECT *, just
transformations that only work under certain variables-scope assumptions.

A variable can be fixed (case 1) in one part of a query and only maybe
(case 2) in another part of the query,

So, e.g.
X = P1 OPTIONAL P2
Fixed(X) = Fixed(P1)
Maybe(X) = Fixed(P2) union Maybe(P2) union Maybe(P1)

X = P1 UNION P2
Fixed(X) = Fixed(P1) ∩ Fixed(P2)
Maybe(X) is
(Fixed(P1) ∪ Fixed(P2) ∪ Maybe(P2) ∪ Maybe(P1)) \ Fixed(X)

PotentiallyBound(X) is Fixed(X) ∪ Maybe(X) and can be used to simplify
the combination expressions.
```
Received on Monday, 1 November 2010 14:41:31 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:01:02 UTC