Comments on optional pattern matching (CR 6 apr 2006)

5.1 Optional pattern matching
last sentence of the largest paragraph says "The whole graph
pattern of an optional graph pattern must match for the
optional graph pattern to add to the query solution."
The term "query solution" is not defined, but it also occurs
in 10.1.3 "ORDER BY" where it refers to the solution sequence.
In that case, the phrase "add to the query solution" presumbably
means to add another solution to the solution sequence. However,
that meaning does not always make sense here, because, given
a pattern P of the form Pattern1 OPTIONAL { Pattern2 }, the
cardinality of the solution sequence of P can be the same as the
cardinality of the solution sequence of Pattern1, if
for every match to Pattern1, either Pattern2 has a unique match
or no match. 

Instead of using the verb "add", may I suggest the verb "widen"?
It is true that OPTIONAL may increase the number of results,
but the primary role of OPTIONAL is to widen the results with
additional variables and SPARQL blank nodes found in the second
pattern.

 

5.4 Optional matching - formal definition
The definition is problematic because it uses two terms,
"pattern solution" and "solution".  It is not clear whether
these are distinct concepts or the same concept. 

I believe most readers will think they are the same concept.
In that case, the definition does not work because it is logically
equivalent to "S is a solution of optional graph pattern if S is a
pattern solution of A".  Proof: Let S be a pattern solution of A.
Now either S is a pattern solution of B or not.
If S is a pattern solution of B, then S is a pattern solution
of A and B and meets the criterion.  If S is not a pattern
solution of B, then S is a pattern solution of A but not of B, so
S meets the "otherwise" part of the criterion.  Thus in either
case S satisfies the criterion.

If "pattern solution" and "solution" are separate concepts,
then we need to be more explicit about the distinction,
preferably by coming up with a two-word phrase for the latter
concept.

If this is our intent, it still does not rescue the definition
logically.  A close reading of the actual words shows that
we are defining "solution" as a recursive definition built
on "pattern solution".  For S to be a "solution" of Opt(A,B),
then the first possibility is that S is a "pattern solution"
of A and B.  However, the intent is that B might itself be
an optional graph pattern, and this close reading of the
definition leaves the notion of being a "pattern solution" of
an optional graph pattern undefined, so the recursion breaks
down when defining Opt (A, Opt (B, C)).

In addition, the definition does not work because the definition
of a pattern solution is a total function whose domain is all
variables.  For example, consider the query
{ ?x ?y ?z . OPTIONAL { ?x ?z ?w } } applied to the graph
with a single triple G = { (ex:a, ex:b, ex:c) }.
Here is a trial solution:
S(x) = ex:a, S(y) = ex:b, S(z) = ex:c, S(w) = ex:a.
Note that I have deliberately defined S as a total function on all
variables in the query.  Now let's try to apply the definition of
optional matching to this trial solution.  We see that
S is a solution for ?x ?y ?z and S is not a solution for ?x ?z ?w.
Thus it would seem that S qualifies as a solution because it
satisfies "S is a solution to A but not to A and B".
However, I don't think S should be regarded as a solution to
the query.  Instead, I think that when S fails as a solution to B,
then S should be undefined on any variables that occur only in B.
Thus S2 defined by S2(x) = ex:a, S2(y) = ex;b, S2(z) = ex:c, and
S2(w) undefined should be a solution to the example.

My tentative fix is:

given syntax pattern1 OPTIONAL { pattern2 }, call pattern1 the
mandatory pattern and pattern2 the optional pattern.  Any variable
that appears in the mandatory pattern is called a mandatory variable.
Any variable that appears in the optional pattern and not in the
mandatory pattern is called an optional variable.  A pattern solution S is
a partial function from the variables in the query to RDF terms
such that the following hold:

a) S restricted to the mandatory variables is a pattern solution of 
pattern1.

b) One of the following two cases is true:
   i) S is undefined on all optional variables, or
   ii) S is a pattern solution of pattern2

Fred

Received on Friday, 9 June 2006 05:40:11 UTC