- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Fri, 22 Sep 2006 16:15:17 -0700
- To: public-rdf-dawg@w3.org
In general, the phrase "S is a solution of P" is vague because the specification does not discuss the domain of S. Specific problems: 1. Section 4.3 "Patterns solutions", formal definition of "pattern solution" says that a pattern solution is a total function from V, (the infinite set of variables), to the set of RDF terms. However, this is flatly contradicted in 6.3 "Unbound variables", which points out that UNION and OPTIONAL countenance that a solution might be a partial function. 2. The notion that a pattern solution is a total function on V is even problematic for basic graph patterns, if one uses the mapping technique described in 5.2 "SPARQL basic graph pattern matching" third para, where it recommends "...finding a mapping from blank nodes and variables in the basic graph pattern...". I believe that the domain of a solution to a basic graph pattern BGP is the set of variables in BGP. I believe that this should be explicitly stated in a formal definition, such as the definition of basic graph pattern E-matching in section 5.1 "General framework". The difference can be seen by considering SELECT ?x ?y WHERE { ?x :v :n2 } on data { ( :n1 :v :n2 ) }. The scoping set is { :n1, :v, :n2 }. If we say that solutions are total functions, then ?y needs to be bound and it would seem that, e.g., S: ?x -> :n1, ?y -> :v is a solution, at least according to the E-matching definition. I believe that the WHERE clause creates bindings only for ?x; the solutions all have domain { ?x } and so the only solution is S: ?x -> :n1, ?y not bound. 3. Turning to FILTER, I believe the intent is that a FILTER cannot create a binding, it can only weed out solutions. For example, SELECT ?x WHERE { FILTER (!BOUND(?x) || ?x = ?x) }. I interpret this as having an empty group pattern, which by section 6.2 "empty group pattern" has a single solution with no bindings. The FILTER returns true for that solution, so the solution sequence consists of a single solution, the empty solution. On the other hand, SELECT ?x WHERE { FILTER (?x = ?x) } produces an error in the FILTER because ?x is not bound, so there are no solutions, not even the empty solution. 4. Turning to UNION, consider SELECT ?x ?y WHERE { { ?x :v :n2 } UNION { ?y :v :n2 } } on the same data as #2. Question: is S: ?x -> :n1, ?y -> :n1 a solution? With the current laxity about what is the domain of a solution, one could argue that it is. However, I am arguing that we should tighten the definition of solution so that a pattern never has a solution with a binding on a variable not found in the pattern. In that case S is not a solution of { ?x :v :n2 } and it is not a solution of { ?y :v :n2 }. Thus there are only two solutions: S1: ?x -> :n1, ?y unbound S2: ?x unbound, ?y -> :n2 5. The most serious problem is with OPTIONAL. See http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0176.html for the provisional conclusion of an email thread on how to define OPTIONAL. However, I think the solution proposed there is still flawed, and in this message propose the following formal definition: Given patterns A and B, then S is a solution of OPT (A, B) if the domain of S is a subset of the variables found in A and B, and one of the following is true: a) There is a restriction of S that solves A and a restriction of S that solves B; or b). S is a solution of A and there is no extension of S that has a restriction that solves B. We need to look at this very carefully. I believe my words above capture the definition of the outer join operator in the Chilean paper. 6. In the case of group graph patterns, the formal definition in 6.1 "Group graph pattern" needs to change to something like "a solution of Group Graph Pattern GP on graph G is any solution S such that, for every element GPi of GP, S restricted to the variables of GPi is a solution of GPi". 7. Looking at GRAPH, consider the formal definition in 9.3 "querying the dataset". I believe that tightening the definition of pattern solution elsewhere in the spec will have no consequences for part 1 of this definition (which concerns a constant IRI as the graph selector). As for part 2, where the graph selector is a variable, it may happen that the variable does not appear in the pattern P. Thus we need to say that S is a pattern solution of GRAPH(g, P) if the domain of S includes g, the domain of S does not include any variable outside of g or P, and that S when restricted to the variables that appear in P is a solution of P. 8. I believe it follows from the above points that variables are bound only as a result of basic graph patterns and the selector variable of GRAPH patterns. Thus the domain of a solution of P is a subset of those variables of P that appear in basic graph patterns or in the graph selector of GRAPH patterns. It might be good to state this explicitly somewhere, though I believe that it could be deduced by induction. Fred
Received on Friday, 22 September 2006 23:16:10 UTC