- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Tue, 4 Aug 2009 15:18:01 +0000
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
This is an exploration of the equivalence of NOT EXISTS and DIFF when the pattern for negation-as-failure is a basic graph pattern (BGP). ++ This is work-in-progress. Note: write "DIFF" because there are several MINUS definitions floating around and this material only applies to the case of MINUS as anti-join (which is DIFF), although this can be extended to the other MINUS definitions. I haven't done that yet. == Summary { P NOT EXISTS Q } and { P DIFF Q } give the same results when it's possible to calculate join(P,Q) by substituting from P into Q then matching the more grounded pattern. This is a lot of the time; it requires fairly odd or fairly complicated queries to not be true. == Some existing terminology and definitions Let P and Q be graphs patterns: combining eval and the operator definitions: Definition: Diff (simplified, no expression part): Diff(P, Q) = { p | p in eval(P), such that for all q in eval(Q), p and q are not compatible} Definition: Join Join(P, Q) = { merge(p, q) | p in eval(P), q in eval(Q) where p and q are compatible } and so Diff(P, Q) = { p | for all q in eval(Q), there is no merge(p,q) in join(P,Q) } "join" and "diff" split the space of pairs (p,q) into two disjoint partitions. == New Notation For pattern Q and solution mapping s, write Q[s] for the pattern formed by taking each variable v in dom(s) and replacing any occurrence of v in Q by the value s(v). [Substitution] Write eval(Q,p) for the result of evaluation pattern Q[p] where each q in Q[p] is merged with p. eval(Q[p]) contains no variables in dom(p) because they have been replaced by the associated value. eval(Q,p) adds the p bindings into the results of Q[p] (merge and union are equivalent as there is no common variables to worry about compatibility). eval(Q,p) = { union(p,q) | q is a solution mapping of Q[p] } = { merge(p,q) | q is a solution mapping of Q[p] } Definition: Substitution Join The substitution join of patterns P and Q is: subjoin(P,Q) = { x | for all p in eval(P), x in eval(Q, p) } = { merge(p, q) | p in eval(P), q in eval(Q[p]) } == Discussion In many cases, join(P,Q) = subjoin(P,Q). In particular, if Q is a BGP. Intuitively, this says that if Q uses a variable v, then the variable is defined in the result of evaluating Q. There are no out-of-scope variables. An example where this condition is not applicable: { { ?a :p ?v } { ?a :q ?w . FILTER(?v>23) } } because ?v appears in Q (but ?v is not in eval(Q)) and substituting changes the FILTER. Projections inside Q (sub-queries) would also have this feature. Optionals also have an effect (see the doubly nested optional case). In these examples, DIFF and NON EXISTS do not evalaute to the same thing. ==== Data :a1 :age 1 . :a1 :q :w1 . :a2 :age 64 . :a2 :q :w2 . ==== SELECT * { ?a :age ?v NOT EXISTS { ?a :q ?w . FILTER(?v>23) } } --------------------------- | v | a | =========================== | 1 | <http://example/a1> | --------------------------- SELECT * { { ?a :age ?v } DIFF { ?a :q ?w . FILTER(?v>23) } } ---------------------------- | v | a | ============================ | 64 | <http://example/a2> | | 1 | <http://example/a1> | ---------------------------- A FILTER involving bound() of an otherwise unmentioned variable in the RHS has the opposite effect. If we used the full definition of DIFF (with expressions) then we could, like OPTIONAL/LeftJoin make this become the same but it does not cover the case of nested optionals. ----------------- FOR THE CASES WHERE join(P,Q) = subjoin(P,Q) [**] Now consider DIFF and NOT EXISTS. If eval(Q[p]) has no solutions then neither does eval(Q,p). If eval(Q,p) has no solutions then neither does eval(Q[p]). eval({ P NOT EXISTS Q }) is (result of P filtered by whether Q + bindings matches at all) = { p | p in eval(P) and eval(Q[p]) has no solutions } = { p | p in eval(P) and eval(Q,p) has no solutions } [**] Use the assumption at this step. = { p | p in eval(P) such that for all q in eval(Q), p and q are not compatible } = eval({ P DIFF Q }) Hence NOT EXISTS and DIFF are the same where join(P,Q) = subjoin(P,Q) which includes when Q is a BGP. What remains is to refine the conditions under which join(P,Q) = subjoin(P,Q) It requires tracking scope or variables across optionals, filters and projections.
Received on Tuesday, 4 August 2009 15:19:07 UTC