problems with EXISTS

Thesis: The definition of EXISTS is broken so bad in SPARQL that it should
be replaced with a completely different mechanism.


The situation is extraordinarily bad with respects to EXISTS.  There are
several errors in the specification of substitution.  All of these have
"obvious" solutions, at least to any particular implementer.  As every
implementer has had to fix this part of SPARQL they each get cover for their
particular changes even when there is divergence.

Here is a (partial) list of problems with EXISTS.

SELECT ?x WHERE {
  ?x :a :b .
  FILTER EXISTS { VALUES (?x) { ( :c ) } }
}

(errata-query-10)
Obviously the ?x in the VALUES should not be substituted as substitution
would result in a undefined construct.  This results in the filter being
true when the mapping of ?x is :c.

SELECT ?x WHERE {
  ?x :a :b .
  FILTER EXISTS { BIND ( :c AS ?x ) }
}

Obviously the ?x in the BIND should not be substituted as substitution would
result in a undefined construct.  This results in the filter always being
true.

SELECT ?x WHERE {
  ?x :a ?y .
  FILTER EXISTS { SELECT ?x WHERE { ?x :a :c . } }
}

Obviously the ?x in the inner SelectClause should not be substituted as
substitution would result in a undefined construct.  Obviously the ?x in the
inner BGP should not be substituted as that would change the meaning of the
inner SELECT.  This results in the filter being true whenever there is any
triple with predicate :a and object :c.

SELECT ?x WHERE {
  ?x :a ?y .
  FILTER EXISTS { SELECT ?z WHERE { ?z :a ?y . } }
}

(errata-query-8) (https://scirate.com/arxiv/1606.01441)
Obviously the ?y in the inner SELECT should not be substituted because it is
a different variable from the ?y in the outer SELECT even though the SPARQL
specification says to do the substitution and that is well-defined.  The
SPARQL specification just produces the wrong answers.  This results in the
filter being true whenever there is any triple with predicate :a instead of
only being true for mappings of ?x and ?y when there is a triple predicate :a
and object that is the mapping of ?y.

SELECT ?x WHERE {
  :s :p ?x .
  FILTER EXISTS { ?x :p _:a . }
}

Obviously blank nodes that are mappings for ?x should not be subject to RDF
instance mappings like _:a is, even though the SPARQL specification says to
allow substitution and that is well-defined.  The SPARQL specification just
produces the wrong answers.  One way to get the right answers is to add a
new kind of token to the SPARQL algebra.  A different, and better, way is to
go to a mapping-based definition.  This results in the filter being false on
the graph { :s :p_:b .  } instead of being true.

SELECT ?x WHERE {
  ?x :a :b .
  FILTER EXISTS { ?x :a :b . MINUS { ?x :a :b } }
}

Obviously the ?x should still be considered to be a shared variable for the
MINUS because otherwise its meaning changes dramatically, even though the
SPARQL specification says that the meaning is to change and that is
well-defined.  The SPARQL specification just produces the wrong answers. To
get the right answers requires keeping track of which substitutions have
been applied in the SPARQL algebra, which really means going from a
substitution-based definition to a mapping-based definition.  This results
in the filter always failing instead of always succeeding.

SELECT ?x WHERE {
  ?x :a ?y .
  FILTER EXISTS { SELECT ?x WHERE { ?x :a :c . } }
}

Obviously the ?x in the inner SelectClause should be limited to bindings for
?x in the mappings that go into the FILTER.  This keeps the intuitive meaning
much better than any substituion-based definition.  This results in the
filter being true whenever there is a triple with predicate :a, object :c,
and subject the mapping of ?x.


So not only do changes need to be made to substitute to eliminate situations
where the result cannot be determined but changes need to be made to fix
situations where the SPARQL specification is well-defined but wrong.
Further, these changes need modifications to the actual SPARQL algebra.


In summary, this is a complete and total mess that is causing divergence
between different implementations of SPARQL.  I really do think that these
problems are the result of using a substitution definition and that trying
to produce a better substitution definition as in
https://scirate.com/arxiv/1606.01441 is not the right way to proceed.  As a
bonus, a mapping-based definition can be used to define pre-binding correctly.


Peter F. Patel-Schneider
Nuance Communications

Received on Friday, 17 June 2016 02:22:49 UTC