[ISSUE-29] Negation: looking for equivalence


This is an exploration of the equivalence of NOT EXISTS and DIFF when the pattern for negation-as-failure is a basic graph pattern (BGP).

++ This is work-in-progress.

Note: write "DIFF" because there are several MINUS definitions floating around and this material only applies to the case of MINUS as anti-join (which is DIFF), although this can be extended to the other MINUS definitions.  I haven't done that yet.

== Summary

{ P NOT EXISTS Q } and { P DIFF Q } give the same results when it's possible to calculate join(P,Q) by substituting from P into Q then matching the more grounded pattern.  This is a lot of the time; it requires fairly odd or fairly complicated queries to not be true.


== Some existing terminology and definitions
  
Let P and Q be graphs patterns: combining eval and the operator definitions:

Definition: Diff (simplified, no expression part):

  Diff(P, Q) = { p | p in eval(P), such that for all q in eval(Q),  
                       p and q are not compatible}
                       
Definition: Join
                       
  Join(P, Q) = { merge(p, q) | p in eval(P), q in eval(Q) where p and q are compatible }

and so

  Diff(P, Q) = { p | for all q in eval(Q), there is no merge(p,q) in join(P,Q) }

"join" and "diff" split the space of pairs (p,q) into two disjoint partitions.
  
== New Notation

For pattern Q and solution mapping s, write Q[s] for the pattern formed by taking each variable v in dom(s) and replacing any occurrence of v in Q by the value s(v).  [Substitution]

Write eval(Q,p) for the result of evaluation pattern Q[p] where each q in Q[p] is merged with p.  eval(Q[p]) contains no variables in dom(p) because they have been replaced by the associated value.  eval(Q,p) adds the p bindings into the results of Q[p] (merge and union are equivalent as there is no common variables to worry about compatibility).

eval(Q,p) = { union(p,q) | q is a solution mapping of Q[p] }
          = { merge(p,q) | q is a solution mapping of Q[p] }

Definition: Substitution Join

The substitution join of patterns P and Q is:

subjoin(P,Q) = { x | for all p in eval(P), x in eval(Q, p) }
             = { merge(p, q) | p in eval(P), q in eval(Q[p]) }

== Discussion

In many cases, join(P,Q) = subjoin(P,Q).  In particular, if Q is a BGP.  Intuitively, this says that if Q uses a variable v, then the variable is defined in the result of evaluating Q.  There are no out-of-scope variables.  An example where this condition is not applicable:

{ { ?a :p ?v } { ?a :q ?w . FILTER(?v>23) } }

because ?v appears in Q (but ?v is not in eval(Q)) and substituting changes the FILTER.

Projections inside Q (sub-queries) would also have this feature.  Optionals also have an effect (see the doubly nested optional case).

In these examples, DIFF and NON EXISTS do not evalaute to the same thing.

==== Data
:a1 :age 1 .
:a1 :q :w1 .
:a2 :age 64 .
:a2 :q :w2 .
====

SELECT * { ?a :age ?v  NOT EXISTS { ?a :q ?w . FILTER(?v>23) } }

---------------------------
| v | a                   |
===========================
| 1 | <http://example/a1> |
---------------------------

SELECT * { { ?a :age ?v } DIFF { ?a :q ?w . FILTER(?v>23) } }

----------------------------
| v  | a                   |
============================
| 64 | <http://example/a2> |
| 1  | <http://example/a1> |
----------------------------

A FILTER involving bound() of an otherwise unmentioned variable in the RHS has the opposite effect.

If we used the full definition of DIFF (with expressions) then we could, like OPTIONAL/LeftJoin make this become the same but it does not cover the case of nested optionals.

-----------------

FOR THE CASES WHERE  join(P,Q) = subjoin(P,Q)  [**]           

Now consider DIFF and NOT EXISTS.

If eval(Q[p]) has no solutions then neither does eval(Q,p).
If eval(Q,p) has no solutions then neither does eval(Q[p]).

eval({ P NOT EXISTS Q })
is
  (result of P filtered by whether Q + bindings matches at all)
   = { p | p in eval(P) and eval(Q[p]) has no solutions }
   = { p | p in eval(P) and eval(Q,p) has no solutions }
        [**] Use the assumption at this step.
   = { p | p in eval(P) such that for all q in eval(Q), p and q are not compatible }
   = eval({ P DIFF Q })

Hence
NOT EXISTS and DIFF are the same where join(P,Q) = subjoin(P,Q)
which includes when Q is a BGP.   

What remains is to refine the conditions under which join(P,Q) = subjoin(P,Q)

It requires tracking scope or variables across optionals, filters and projections.

Received on Tuesday, 4 August 2009 15:19:07 UTC