Comments on pattern solution definition

I believe the definition of Pattern Solution is incomplete. It says:

" A Pattern Solution of Graph Pattern GP on graph G is any substitution S
such that S(GP) is a subgraph of G."

which addresses basic patterns but not unions, optionals, graph patterns, or
constraints. Perhaps it should be more like this:


"A Pattern Solution of Graph Pattern GP on graph G is any substitution S
such that any of the following holds:

- GP is a Basic Pattern and S(GP) is a subgraph of G
- GP is a Graph Pattern Grouping {GP1...GPn} and for each i[1-n], S is a
Pattern Solution of GPi on G
- GP is a Union Pattern (GP1 UNION GP2) and S is a Pattern Solution of GP1
on G or S is a Pattern Solution of GP2 on G
- GP is an Optional Pattern (OPTIONAL GP1) and S is a Pattern Solution of
GP1 on G or S is not a Pattern Solution of GP1 on G
- GP is a Constraint Pattern (GP1 FILTER CP) and S is a Pattern Solution of
GP1 on G and CP is true for S
- GP is a Graph Pattern Solution (GRAPH G1 GP1) and S is a Pattern Solution
of GP1 on G1"

(Note that the current OPTIONAL definition treats OPTIONAL like a binary
operator while the grammar treats it as unary. It's treated as a unary op
above.) 

Neither this definition nor any in the existing draft (AFAICT) completely
addresses the question of what the valid substitutions are for a Graph
Pattern (as it relates to the question of unbound variables). A Substitution
is defined as a function from variables to RDF terms. In theory, I suppose
the set of RDF terms is infinite, but in practice it makes sense to consider
only the active domain for variables that are range restricted by the Query
Pattern (where the active domain is the set of RDF terms that occur in the
RDF Dataset and the Query Pattern - including any graph labels).
Difficulties can arise in the case of a union or an optional where there is
a disjunctive path through the query in which the variable does not occur in
a Basic Pattern (i.e. in which it's not range restricted). These are the
cases where we expect to get an unbound variable in the solution (as opposed
to getting solutions for each possible value of the variable.) If the query
is reduced to disjunctive normal form, it becomes clear that for each
disjunct any variable that is not range restricted should not be allowed to
range over the domain of all RDF Terms, but should instead just range over a
singular placeholder value representing the infinite possible values of that
variable. (I'm not sure how you'd say that for a query not in DNF.)

-Geoff

Received on Saturday, 2 April 2005 16:02:24 UTC