Re: Binding injection proposals

On 30/10/16 02:07, Peter F. Patel-Schneider wrote:
> Take two.
>
>
>
> Proposal A:
>
> ** Intuition/Outline
>
> Inject variable bindings into the top-level group graph pattern.
>
> ** Proposal
>
> Translate EXISTS{P} as  exists(t,translate(Initial(t) P'))
>   where P' is P fixed up to be syntactically suitable
> Translate Initial(t) as Initial(t)
> Given variable binding b,
>   exists(t,P) is true iff eval(D(G),P') is non-empty
>   where P' is P with Initial(t) replaced by b
>
>
> Proposal B:
>
> ** Intuition/Outline
>
> Ensure that the variables from the row being filtered are available, not
> just their values as in the spec with substitution.
>
> This is done by restricting the variable to the value of the row at the
> point where the variable is being bound.  This is in a BGP.   Use of BIND (
> e AS ?x) for ?x in the current row is not allowed : ?x is considered to be
> already set.
>
> ** Proposal
>
> Rewrite the pattern of the EXISTS filter at algebra evaluation time; do this
> in a deep fashion, while respecting scoping rules.
>
> Let V = in-scope variables at the point of the FILTER EXISTS.
>
> For each BGP in the pattern:
>         Build a VALUES with variables of V
>  still in-scope at this BGP. << This needs to be specified.
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Agreed.

>         This includes empty BGPs
>  Replace the BGP with Join(values,bgp)
>
>
> There are two differences here:
>
> 1/ Where to push the variable bindings.
> 2/ How to push them.
>
>
> The how is the less important difference.  Proposal A pushes the bindings to
> the "beginnings" of group graph patterns via the Initial(t) where they
> affect each construct on the group graph pattern.  Proposal B pushes the
> bindings to each BGP.  This ends up in effect pushing the bindings to the
> beginning of each group graph pattern because there is always a BGP (maybe
> empty) at the beginning of the translation of each group graph pattern

It's a bit confusing to talk about group graph patterns and proposal B 
because group graph pattern do not exist in the algebra.

Evaluation, being bottom up, always evaluates from BGPs upwards.  BGPs 
are the point at which patterns access the data - "GRAPH ?g" is the 
other place.

The results of BGPs are joined/filtered/union'ed so the idea of 
restricting the range of variable values is being applied at eachplace a 
variable is being bound to a value.

> (IF
> THE SIMPLIFICATION STEP in 18.2.2.8 IS NOT DONE!).  However, proposal B also
> pushes the bindings elsewhere in the translation of the group graph pattern,
> which doesn't appear to have any consequences.
>
> I don't see a significant difference between the two proposals in the how.
> Proposal A would do a little more at translate time and Proposal B would do
> a little more at algebra evaluation time.
>
> I think that if Proposal B was changed to only affect BGPs directly in the
> top-level group graph pattern then it would have the same results as
> Proposal A.

See below. It is intention to be not just top-level.  It is less change 
from the spec. The st important effect is that it keeps FILTERs inside 
non-top level elements working.

> The where is where the real difference between the two proposals lies.
> Proposal A only pushes to the beginning of the top-level group graph pattern
> so it makes an EXIST act very much like a VALUES at the beginning of the
> top-level group graph pattern.  Proposal B pushes into every BGP, including
> BGPs inside OPTIONAL, MINUS, FILTER, subqueries, etc.  Some of these extra
> changes make Proposal B work more like the current EXISTS, but some of them
> make changes from the current EXISTS.  For example, a MINUS with no common
> variables could be changed into a MINUS with common variables, flipping its
> meaning.  Proposal B might also have problems with EXISTS inside of EXISTS.
> (I haven't worked all the details out here so I'm not sure whether there is
> a problem.)

Making MINUS more usable is an intended change at least to keep the 
domains the same so that we don't hit the "no vars in common" case. 
There may be other effects.

> Proposal B also requires a new notion of variable scoping for SPARQL to
> control which variables are pushed into sub-queries.

Not really - the scoping rules already exist in the spec.  I don't see 
this as an new concept.  I agree it needs more definition.

Differences:

https://github.com/w3c/sparql-exists/tree/gh-pages/tests

The tests I put in were not intended to be too demanding but on one 
point they do differentiate the two proposals.

The issue is with GRAPH but it applies to UNION and other forms.

==== Test :exists-filter-1
PREFIX : <http://example/>

SELECT ?s {
   ?s :q ?v .
   FILTER EXISTS {
       :r :p ?z .
       FILTER(?z = ?v)
     }
}

==== Test :exists-graph-2
PREFIX : <http://example/>

SELECT ?s {
   ?s :q ?v .
   FILTER EXISTS {
       GRAPH :graph {
           :r :p ?z .
           FILTER(?z = ?v)
       }
     }
}

====


In :exists-filter-1, the FILTER in the EXISTS can access ?v from the 
outer part of the query

In :exists-graph-2, the FILTER in the EXISTS can't in proposal A - the 
insertion of the VALUES

   FILTER EXISTS {
       VALUES ?v ...
       GRAPH :graph {
           :r :p ?z .
           FILTER(?z = ?v)  # ?v is unbound here
       }
     }

When GRAPH/FILTER is executed, ?v is unbound.

It works in :exists-filter-1 because proposal A uses the fact that the 
filter acting over the whole {} - this is a syntax effect.

By proposal B: it is sort of like:

   FILTER EXISTS {
       { VALUES ?v ... }
       GRAPH :graph {
           { :r :p ?z . VALUES ?v ... }
           FILTER(?z = ?v) # ?v is bound here
       }
     }

Each BGP is replaces by { BGP VALUES } - it is not done this way - the 
syntax-based description here is just illustrative.

And it's the same as { BGP FILTER(sameTerm...) }

There are going to be differences if variables are allowed to range more 
than the current row.  It's not about scope : an OPTIONAL can be changed 
and the variable its RHS effectively hidden.  This is what happens with 
doubly nested optionals and why the top down reading gives the wrong 
answers.  However, such technical differences are less important. The 
FILTER examples are more likely to occur in real use.

I think restricting at the point of binding (proposal B) is much closer 
to SQL correlated subqueries.

 Andy

Received on Monday, 31 October 2016 14:51:47 UTC