ISSUE: domain of solutions from Fred Zemke on 2006-09-22 (public-rdf-dawg@w3.org from July to September 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Fri, 22 Sep 2006 16:15:17 -0700
To: public-rdf-dawg@w3.org
Message-ID: <45146E85.3030304@oracle.com>
In general, the phrase "S is a solution of P" is vague because
the specification does not discuss the domain of S. 

Specific problems:

1. Section 4.3 "Patterns solutions",
formal definition of "pattern solution" says that a pattern solution
is a total function from V, (the infinite set of variables), to the set
of RDF terms.  However, this is flatly contradicted in 6.3
"Unbound variables", which points out that UNION and OPTIONAL
countenance that a solution might be a partial function.

2. The notion that a pattern solution is a total function on V is
even problematic for basic graph patterns, if one uses the mapping
technique described in 5.2 "SPARQL basic graph pattern matching"
third para, where it recommends "...finding a mapping from blank nodes
and variables in the basic graph pattern...".  I believe that the domain
of a solution to a basic graph pattern BGP is the set of variables in BGP.
I believe that this should be explicitly stated in a formal definition,
such as the definition of basic graph pattern E-matching in section
5.1 "General framework".

The difference can be seen by considering
SELECT ?x ?y WHERE { ?x :v :n2 }
on data { ( :n1 :v :n2 ) }. 
The scoping set is { :n1, :v, :n2 }.
If we say that solutions are total functions, then ?y needs to be bound
and it would seem that, e.g., S: ?x -> :n1, ?y -> :v is a solution,
at least according to the E-matching definition.
I believe that the WHERE clause creates bindings only for ?x;
the solutions all have domain { ?x } and so the only solution is
S: ?x -> :n1, ?y not bound.

3. Turning to FILTER, I believe the intent is that a FILTER cannot
create a binding, it can only weed out solutions.  For example,
SELECT ?x WHERE { FILTER (!BOUND(?x) || ?x = ?x) }.
I interpret this as having an empty group pattern, which by section
6.2 "empty group pattern" has a single solution with no bindings.
The FILTER returns true for that solution, so the solution sequence
consists of a single solution, the empty solution.  On the other hand,
SELECT ?x WHERE { FILTER (?x = ?x) }
produces an error in the FILTER because ?x is not bound,
so there are no solutions, not even the empty solution.

4. Turning to UNION, consider
SELECT ?x ?y WHERE { { ?x :v :n2 } UNION { ?y :v :n2 } }
on the same data as #2.  Question: is S: ?x -> :n1, ?y -> :n1 a solution?
With the current laxity about what is the domain of a solution,
one could argue that it is.  However, I am arguing that we should
tighten the definition of solution so that a pattern never has a
solution with a binding on a variable not found in the pattern. 
In that case S is not a solution of { ?x :v :n2 } and it is not
a solution of { ?y :v :n2 }.  Thus there are only two solutions:
S1: ?x -> :n1, ?y unbound
S2: ?x unbound, ?y -> :n2

5. The most serious problem is with OPTIONAL.  See
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0176.html
for the provisional conclusion of an email thread on how to define
OPTIONAL.  However, I think the solution proposed there is
still flawed, and in this message propose the following formal definition:

Given patterns A and B, then S is a solution of OPT (A, B)
if the domain of S is a subset of the variables found in
A and B, and one of the following is true:

a) There is a restriction of S that solves A and a restriction
of S that solves B; or
b). S is a solution of A and there is no extension of S that
has a restriction that solves B.

We need to look at this very carefully.  I believe my words
above capture the definition of the outer join operator in the
Chilean paper.

6. In the case of group graph patterns, the formal
definition in 6.1 "Group graph pattern" needs to change to something
like "a solution of Group Graph Pattern GP on graph G is any
solution S such that, for every element GPi of GP, S restricted
to the variables of GPi is a solution of GPi". 

7. Looking at GRAPH, consider the formal definition in
9.3 "querying the dataset".  I believe that tightening the definition of
pattern solution elsewhere in the spec will have no consequences for
part 1 of this definition (which concerns a constant IRI as the graph
selector).  As for part 2, where the graph selector is a variable, it
may happen that the variable does not appear in the pattern P.
Thus we need to say that S is a pattern solution of GRAPH(g, P)
if the domain of S includes g, the domain of S does not include any
variable outside of g or P, and that S when restricted to the
variables that appear in P is a solution of P. 

8. I believe it follows from the above points that variables are
bound only as a result of basic graph patterns and the selector variable
of GRAPH patterns.  Thus the domain of a solution of P is a subset
of those variables of P that appear in basic graph patterns or in
the graph selector of GRAPH patterns.  It might be good to state
this explicitly somewhere, though I believe that it could be deduced
by induction.

Fred
Received on Friday, 22 September 2006 23:16:10 UTC