- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Tue, 4 Aug 2009 15:18:01 +0000
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
This is an exploration of the equivalence of NOT EXISTS and DIFF when the pattern for negation-as-failure is a basic graph pattern (BGP).
++ This is work-in-progress.
Note: write "DIFF" because there are several MINUS definitions floating around and this material only applies to the case of MINUS as anti-join (which is DIFF), although this can be extended to the other MINUS definitions. I haven't done that yet.
== Summary
{ P NOT EXISTS Q } and { P DIFF Q } give the same results when it's possible to calculate join(P,Q) by substituting from P into Q then matching the more grounded pattern. This is a lot of the time; it requires fairly odd or fairly complicated queries to not be true.
== Some existing terminology and definitions
Let P and Q be graphs patterns: combining eval and the operator definitions:
Definition: Diff (simplified, no expression part):
Diff(P, Q) = { p | p in eval(P), such that for all q in eval(Q),
p and q are not compatible}
Definition: Join
Join(P, Q) = { merge(p, q) | p in eval(P), q in eval(Q) where p and q are compatible }
and so
Diff(P, Q) = { p | for all q in eval(Q), there is no merge(p,q) in join(P,Q) }
"join" and "diff" split the space of pairs (p,q) into two disjoint partitions.
== New Notation
For pattern Q and solution mapping s, write Q[s] for the pattern formed by taking each variable v in dom(s) and replacing any occurrence of v in Q by the value s(v). [Substitution]
Write eval(Q,p) for the result of evaluation pattern Q[p] where each q in Q[p] is merged with p. eval(Q[p]) contains no variables in dom(p) because they have been replaced by the associated value. eval(Q,p) adds the p bindings into the results of Q[p] (merge and union are equivalent as there is no common variables to worry about compatibility).
eval(Q,p) = { union(p,q) | q is a solution mapping of Q[p] }
= { merge(p,q) | q is a solution mapping of Q[p] }
Definition: Substitution Join
The substitution join of patterns P and Q is:
subjoin(P,Q) = { x | for all p in eval(P), x in eval(Q, p) }
= { merge(p, q) | p in eval(P), q in eval(Q[p]) }
== Discussion
In many cases, join(P,Q) = subjoin(P,Q). In particular, if Q is a BGP. Intuitively, this says that if Q uses a variable v, then the variable is defined in the result of evaluating Q. There are no out-of-scope variables. An example where this condition is not applicable:
{ { ?a :p ?v } { ?a :q ?w . FILTER(?v>23) } }
because ?v appears in Q (but ?v is not in eval(Q)) and substituting changes the FILTER.
Projections inside Q (sub-queries) would also have this feature. Optionals also have an effect (see the doubly nested optional case).
In these examples, DIFF and NON EXISTS do not evalaute to the same thing.
==== Data
:a1 :age 1 .
:a1 :q :w1 .
:a2 :age 64 .
:a2 :q :w2 .
====
SELECT * { ?a :age ?v NOT EXISTS { ?a :q ?w . FILTER(?v>23) } }
---------------------------
| v | a |
===========================
| 1 | <http://example/a1> |
---------------------------
SELECT * { { ?a :age ?v } DIFF { ?a :q ?w . FILTER(?v>23) } }
----------------------------
| v | a |
============================
| 64 | <http://example/a2> |
| 1 | <http://example/a1> |
----------------------------
A FILTER involving bound() of an otherwise unmentioned variable in the RHS has the opposite effect.
If we used the full definition of DIFF (with expressions) then we could, like OPTIONAL/LeftJoin make this become the same but it does not cover the case of nested optionals.
-----------------
FOR THE CASES WHERE join(P,Q) = subjoin(P,Q) [**]
Now consider DIFF and NOT EXISTS.
If eval(Q[p]) has no solutions then neither does eval(Q,p).
If eval(Q,p) has no solutions then neither does eval(Q[p]).
eval({ P NOT EXISTS Q })
is
(result of P filtered by whether Q + bindings matches at all)
= { p | p in eval(P) and eval(Q[p]) has no solutions }
= { p | p in eval(P) and eval(Q,p) has no solutions }
[**] Use the assumption at this step.
= { p | p in eval(P) such that for all q in eval(Q), p and q are not compatible }
= eval({ P DIFF Q })
Hence
NOT EXISTS and DIFF are the same where join(P,Q) = subjoin(P,Q)
which includes when Q is a BGP.
What remains is to refine the conditions under which join(P,Q) = subjoin(P,Q)
It requires tracking scope or variables across optionals, filters and projections.
Received on Tuesday, 4 August 2009 15:19:07 UTC