- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Thu, 16 Jun 2016 19:22:18 -0700
- To: Jeremy J Carroll <jjc@syapse.com>, Gregory Williams <greg@evilfunhouse.com>
- Cc: public-sparql-dev@w3.org
Thesis: The definition of EXISTS is broken so bad in SPARQL that it should be replaced with a completely different mechanism. The situation is extraordinarily bad with respects to EXISTS. There are several errors in the specification of substitution. All of these have "obvious" solutions, at least to any particular implementer. As every implementer has had to fix this part of SPARQL they each get cover for their particular changes even when there is divergence. Here is a (partial) list of problems with EXISTS. SELECT ?x WHERE { ?x :a :b . FILTER EXISTS { VALUES (?x) { ( :c ) } } } (errata-query-10) Obviously the ?x in the VALUES should not be substituted as substitution would result in a undefined construct. This results in the filter being true when the mapping of ?x is :c. SELECT ?x WHERE { ?x :a :b . FILTER EXISTS { BIND ( :c AS ?x ) } } Obviously the ?x in the BIND should not be substituted as substitution would result in a undefined construct. This results in the filter always being true. SELECT ?x WHERE { ?x :a ?y . FILTER EXISTS { SELECT ?x WHERE { ?x :a :c . } } } Obviously the ?x in the inner SelectClause should not be substituted as substitution would result in a undefined construct. Obviously the ?x in the inner BGP should not be substituted as that would change the meaning of the inner SELECT. This results in the filter being true whenever there is any triple with predicate :a and object :c. SELECT ?x WHERE { ?x :a ?y . FILTER EXISTS { SELECT ?z WHERE { ?z :a ?y . } } } (errata-query-8) (https://scirate.com/arxiv/1606.01441) Obviously the ?y in the inner SELECT should not be substituted because it is a different variable from the ?y in the outer SELECT even though the SPARQL specification says to do the substitution and that is well-defined. The SPARQL specification just produces the wrong answers. This results in the filter being true whenever there is any triple with predicate :a instead of only being true for mappings of ?x and ?y when there is a triple predicate :a and object that is the mapping of ?y. SELECT ?x WHERE { :s :p ?x . FILTER EXISTS { ?x :p _:a . } } Obviously blank nodes that are mappings for ?x should not be subject to RDF instance mappings like _:a is, even though the SPARQL specification says to allow substitution and that is well-defined. The SPARQL specification just produces the wrong answers. One way to get the right answers is to add a new kind of token to the SPARQL algebra. A different, and better, way is to go to a mapping-based definition. This results in the filter being false on the graph { :s :p_:b . } instead of being true. SELECT ?x WHERE { ?x :a :b . FILTER EXISTS { ?x :a :b . MINUS { ?x :a :b } } } Obviously the ?x should still be considered to be a shared variable for the MINUS because otherwise its meaning changes dramatically, even though the SPARQL specification says that the meaning is to change and that is well-defined. The SPARQL specification just produces the wrong answers. To get the right answers requires keeping track of which substitutions have been applied in the SPARQL algebra, which really means going from a substitution-based definition to a mapping-based definition. This results in the filter always failing instead of always succeeding. SELECT ?x WHERE { ?x :a ?y . FILTER EXISTS { SELECT ?x WHERE { ?x :a :c . } } } Obviously the ?x in the inner SelectClause should be limited to bindings for ?x in the mappings that go into the FILTER. This keeps the intuitive meaning much better than any substituion-based definition. This results in the filter being true whenever there is a triple with predicate :a, object :c, and subject the mapping of ?x. So not only do changes need to be made to substitute to eliminate situations where the result cannot be determined but changes need to be made to fix situations where the SPARQL specification is well-defined but wrong. Further, these changes need modifications to the actual SPARQL algebra. In summary, this is a complete and total mess that is causing divergence between different implementations of SPARQL. I really do think that these problems are the result of using a substitution definition and that trying to produce a better substitution definition as in https://scirate.com/arxiv/1606.01441 is not the right way to proceed. As a bonus, a mapping-based definition can be used to define pre-binding correctly. Peter F. Patel-Schneider Nuance Communications
Received on Friday, 17 June 2016 02:22:49 UTC