a catalog of problems with EXISTS

[This is a revised version of a message I sent to public-sparql-dev@w3.org.  I
have added a bit more explanation at the beginning, fixed up one example to
make it more on point, tied the examples more closely to the spec, and removed
comments about counterintuitiveness.]



Here are five separate problems that I see with the definition of EXISTS in
https://www.w3.org/TR/2013/REC-sparql11-query-20130321/.

The first two problems are situations where the evaluation of EXISTS hits
undefined areas.  The second of these has quite a few cases even just from a
syntactic viewpoint.  I do not know of any SPARQL implementations that
produce an error for any of these undefined situations.

The last three problems are situations where the evaluation of EXISTS is
well-defined but at least some implementations diverge from the spec.



Problem 1: Some uses of EXISTS are not defined during evaluation

The evalution of exists in 18.6 is only defined for graph patterns, but in
  SELECT ?x WHERE {
    ?x :p :c .
    FILTER EXISTS { SELECT ?y { ?y :q :c . } } }
the argument to exists ends up being a ToMultiSet, which is not listed under
"Graph Pattern" in the table of SPARQL algebra symbols in 18.2.

The argument to exists is not explicitly listed as a "Graph Pattern" when
the argument to EXISTS is a GroupGraphPattern containing just a SubQuery or
just an InlineData.  Here the join is simplified away by section 18.2.2.8
leaving a construct in the SPARQL algebra that is not listed as a graph
pattern symbol, ToMultiSet or a multiset, respectively.  An example of where
this happens (but not at the top level of an EXISTS) is the last example of
18.2.3.


Problem 2: Substitution happens where definitions are only for variables

In
  SELECT ?x WHERE {
    BIND ( :d AS ?x )
    FILTER EXISTS { BIND ( :e AS ?z ) { SELECT ?x { :b :p :c } } } }
the substitution from 18.6 ends up with a non-variable in the second argument
to Project
  Join ( Extend( BGP(), ?z, :e ) ,
         ToMultiSet( Project( ToList( BGP( :b :p :c )), { :d } ) ) )
However Project is only defined in 18.5 for variables in its second argument.

This also affects Extend, multisets, BOUND, and maybe other constructs.


Problem 3: Blank nodes substituted into BGPs act as variables

In
  SELECT ?x WHERE {
    ?x :p :d .
    FILTER EXISTS { ?x :q :b . } }
against the graph { _:c :p :d , :e :q :b }
the substitution from 18.6 ends up producing
  BGP(_:c :q :b)
when then matches against :e :q :b because the _:c can be mapped to :e by
the RDF instance mapping that is part of pattern instance mappings in
18.3.1.

Some implementations diverge from the spec here.


Problem 4: Substitution can flip MINUS to its disjoint-domain case

In
  SELECT ?x WHERE {
    ?x :p :c .
    FILTER EXISTS { ?x :p :c . MINUS { ?x :p :c . } } }
on the graph { :d :p :c }
the substitution from 18.6 ends up producing
  Minus( BGP( :d :p :c ), BGP( :d :p :c ) )
which produces a non-empty result because the two solution mappings for the
Minus have disjoint domains and 18.5 dictates that then the result is not
empty.

Some implementations diverge from the spec here.


Problem 5: Substitution affects disconnected variables

In
  SELECT ?x WHERE {
    BIND ( :d AS ?x )
    FILTER EXISTS { BIND ( :e AS ?z )
                    SELECT ?y WHERE { ?x :p :c } } }
the substitution from 18.6 ends up producing
  Join ( Extend( BGP(), ?z :e ),
         ToMultiSet( Project( ToList( BGP( :d :p :c ) ), { ?y } ) ) )

Some, but not all, implementations diverge from the spec here.



Peter F. Patel-Schneider
Nuance Communications

Received on Wednesday, 6 July 2016 14:51:04 UTC