Comments on the proper domain for solutions (CR 6 Apr 2006)

2.4 Pattern solutions
The definition defines "variable solution" as a partial function and
"pattern solution" as a total function.  Since the heading on the
box calls out "pattern solution", and only the definition of
"pattern solution" is in bold, and "variable solution" never appears
elsewhere in the document, the reader is led to believe that the
important notion is "pattern solution", the total function. 
However, focusing on total functions is a mistake, as shown by
the OPTIONAL and UNION syntaces, which explicitly require partial
functions as solutions to patterns.


2.4 Pattern solutions
It says that a pattern solution is a total function on V, an infinite
set.  As pointed out in a separate comment, solutions in general
are partial functions on V, when OPTIONAL and UNION are considered.
The issue to be raised in this comment is that V is not the
appropriate domain, even in the case of matching a triple pattern.
Consider SELECT ?a ?b ?c WHERE { ?a ?b ?c } evaluated on a graph
containing a single triple, (n:s n:v n:o).
Then, according to the definition, a pattern solution is a total
function mapping F:V -> {n:s, n:v, n:o} such that F(?a) = n:s,
F(?b) = n:v, F(?c) = n:o, and F(v) is unconstrained for all other
variables v.  There are a countable infinity of these total
functions.  Now assemble these pattern solutions into a solution
sequence and project to retain just the variables ?a, ?b and ?c. 
You still have an infinite sequence.  Thus the result of the query
appears to be an infinite sequence (whose every member is the
same function on { ?a, ?b, ?c }).

If one replies that the projection is to be done before assembling
the solution sequence, I observe first that that is not the
description in Section 10.1 "solution sequences and result forms",
but more importantly, that will produce the wrong cardinality on
other queries, for example, SELECT ?a ?b WHERE { ?a ?b ?c }
on a dataset with two triples (n:s n:v, n:o1), (n:s, n:v, n:o2).
In this example, projecting before assembling the solution
sequence will result in only a single solution, whereas there
should be two.

Instead, I believe that the correct algorithm is to look at
pattern solutions whose domain is the set of variables
that appear in the specific query to be evaluated (not the infinite
set V).  And, as pointed out in a separate comment, the focus
should be on partial functions rather than total functions.

Fred

Received on Friday, 9 June 2006 05:40:11 UTC