Re: More minimalistic proposal than potentially bound (http://www.w3.org/2009/sparql/track/actions/329) from Andy Seaborne on 2010-11-01 (public-rdf-dawg@w3.org from October to December 2010)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 01 Nov 2010 14:40:44 +0000
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
CC: Gregory Williams <greg@evilfunhouse.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4CCED16C.10000@epimorphics.com>
On 27/10/10 14:10, Birte Glimm wrote:
> On 26 October 2010 18:51, Gregory Williams<greg@evilfunhouse.com>  wrote:
>> On Oct 26, 2010, at 12:55 PM, Birte Glimm wrote:
>>
>>> Thus, here's my attempt of defining scope, so that I can use the
>>> definition for SELCT * as above:
>>>
>>> Let Q be a query with group graph pattern G. The variables in scope
>>> for Q, written scope(Q) are the variables in scope of G, written
>>> scope(G). The variables in scope of a group graph pattern are
>>> recursively defined as follows:
>>> A group graph pattern G can either be a sub-select query (SubSelect
>>> from the grammar) or a sub-group graph pattern (GroupGraphPatternSub
>>> from the grammar). If G is a sub-group graph pattern, then G can
>>> contain blocks of triples in addition to a non-triple graph pattern
>>> (GraphPatternNotTriples in the grammar). Variables occurring in the
>>> blocks of triples are in scope of G. If G contains a non-triple graph
>>> pattern composed of one or more group graph patterns G1 to Gn, e.g.,
>>> OPTIONAL G1 or G1 UNION G2, then variables in scope(Gi) for each 1<=
>>> i<= n are in scope for G. If G is a sub-select query with group graph
>>> pattern G', then scope(G) = scope(G').

The variables in-scope for a subquery are those of the projection part 
of the SELECT clause, if the sub-query isn't SELECT *.

>>
>> This leaves out several ways that variables can be bound (and in scope) besides triple patterns including GRAPH ?v {}, and BIND(expr AS ?v). Also, I'm a bit unclear on the wording here, as group graph patterns can't "be" sub-select queries or "sub-group" graph patterns. GGPs can contain subqueries, or other GGPs, but they can also contain BGPs and all the other graph patterns.
>
> Well, he wording mainly follows the SPARQL grammar, which
> distinguished between GGP as a single sub-select query and the other
> forms of GGP. Other forms can be composed of other GGPs, e.g., in the
> case of UNION.
>
> I agree that variables for some cases are still missing, as ?v in your
> GRAPH ?v {} example.

MINUS is special as the RHS does not count, presumably.  Likewise, 
FILTER EXISTS.

>> I find the "potentially bound" definition much easier to understand and that definition maps nicely onto my mental model of the different graph patterns and operators in an AST. I personally think there's value in allowing SELECT * to be used in (non-trivial) grouped aggregate queries, and find it clear how the concept of "potentially bound" would encompass both the subselect and aggregate grouping cases we've discussed that affect the list of projectable, in-scope variables.
>
> My main objection to potentially bound is that I think that it is the
> same as variables that are in scope. The term in scope is used in
> several places already,

Yes - "in-scope variables" is "potentially bound".

I prefer talking about in-scope and didn't
change the language when "potentially bound" emerged.  My fault for 
causing confusion.

I think it's used once, in sec 8.1.1 - there is also text talking about 
filter scope and bNode label scope which are different.

> just the definition of what that means is
> still missing and I wouldn't want to introduce another term in
> parallel to in scope, which essentially denotes exactly the same.
> Either there is a difference between in scope and potentially bound
> (which I don't see yet) or we should decide for one term to use
> throughout. I prefer in scope and that's used already in several other
> places.
> Once we have a definition for that, which could then also be closer to
> the one proposed for potentially bound, we can just say that the * in
> SELECT * is an abbreviation for a list of all variables that are in
> scope.

The places where a variable can be set in a solution are, for SPARQL 1.1:
+ BGPs
+ GRAPH ?g
+ AS ?var : SELECT expressions, and GROUP BY (expr AS ?var)
+ BIND
+ BINDINGS

but restricted by:
- Not MINUS RHS
- Not FILTER EXISTS pattern
- Used in a group but not used as a variable (only) in a group key
- Projection in sub-queries.
   That is SELECT ?x { ?a ?b ?c } does not make ?x in variable-scope.

The wiki definition wiki/Potentially_bound mostly works and we can 
define in either syntax space or algebra space during conversion.

There is confusions syntax and algebra in FILTER
P = SERVICE t { P1 } does not bind t if t is a variable.

"Being bound" is an execution concept, more suited to the algebra, "in 
scope" is more static scoping concept more suited to the syntax.

We have some syntax restrictions (fresh AS variables, no projecting 
ungrouped variables) so I was planning to defining the concept in syntax 
space.  To me, scoping is the better concept, except it's used in 
relation to filters and bNodes (the latter concept being outside our 
control).  I propose "variable scope" as the terminology.

 Andy
 Andy


PS

 From writing an execution engine, even for SPARQL 1.0, I found it 
useful to classify variables into 3 categories:

1/ Ones that are always set in a binding - whatever route through query 
execution happens, the variable will be set. Write "Fixed(P)".

2/ Ones that may be set in a binding = there is some route through the 
query execution that can lead that variables. Write "Maybe(P)"

3/ Ones that are read, but not necessarily set (used in filters = they 
may not be used in a place where they get set, and now HAVING; also 
projection of unmentioned variables). When calculating, not necessarily 
disjoint with 1 and 2. Ignore (3) as it does not affect SELECT *, just 
transformations that only work under certain variables-scope assumptions.

A variable can be fixed (case 1) in one part of a query and only maybe 
(case 2) in another part of the query,

So, e.g.
X = P1 OPTIONAL P2
   Fixed(X) = Fixed(P1)
   Maybe(X) = Fixed(P2) union Maybe(P2) union Maybe(P1)

X = P1 UNION P2
   Fixed(X) = Fixed(P1) ∩ Fixed(P2)
   Maybe(X) is
     (Fixed(P1) ∪ Fixed(P2) ∪ Maybe(P2) ∪ Maybe(P1)) \ Fixed(X)

PotentiallyBound(X) is Fixed(X) ∪ Maybe(X) and can be used to simplify 
the combination expressions.
Received on Monday, 1 November 2010 14:41:31 UTC