Comments on UNION matching (CR 6 Apr 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Fri, 09 Jun 2006 00:10:45 +0000
Message-ID: <4488BC47.4040908@oracle.com>

6. Matching alternatives
Second sentence says "If more than one of the alternatives
matches, all the possible pattern solutions will be found."
Does this mean that if a solution is a solution of both
patterns, then the solution occurs twice in the solution sequence?
There are no examples of solutions with multiple cardinality.
Such examples would be helpful.

6.2 Union matching - formal definition
The definition is unclear about whether there are any constraints
on the value of a solution on a variable that
appears in one pattern but not in the other.  Example: what is
the result of

SELECT ?x ?y WHERE { FILTER (?x = ?x) } UNION { FILTER (?y = ?y) }

Suppose there is only one RDF term in the graph, <http:a>.
There are all together four partial functions from the
set of variables in the query {?x, ?y} and the set of RDF terms,
namely:
S1 (?x) = <http:a>, S1(?y) = <http:a>
S2 (?x) = <http:a>, S2(?y) undefined
S3 (?x) undefined,  S3 (?y) = <http:a>
S4 (?x) undefined,  S4 undefined

I believe that the desired set of solutions is {S2, S3},
i.e., S1 is not a solution of this query.  However, arguably,
S1 is a solution of FILTER (?x = ?x), and therefore belongs in
the result set according to the definition as written.

My proposed fix is: let P be pattern1 UNION pattern2.
Then S is a solution of P if either of the following is true:
1. S is a solution of pattern1 and S is undefined on every
variable that is contained in pattern2 but not in pattern1; or
2. S is a solution of pattern2 and S is undefined on every
variable that is contained in pattern1 but not in pattern2.

6.2 Union matching - formal definition
The definition is unclear about duplicates.  If s is a solution
of GP1 and S is a solution of GP2, does the solution sequence
contain a copy of S for each of GP1 and GP2?  I believe the
answer should be that duplicates are maintained because they
might be meaningful to the user; if the user wishes to eliminate
duplicates, the user can specify DISTINCT.  In that case, the
definition proposed in a separate comment needs to be rewritten
because it would eliminate duplicates.  I think the best
approach would be to recognize that the UNION operator is
constructing a solution sequence from the solution sequences of
each operand.  The proposed rewording is then:

Let P be pattern1 UNION pattern2.  Let V1 be the set of variables
that appear in pattern1 and let V2 be the set of variables that
appear in pattern2.  S  = (S1, S2, ... Sn) be a sequence of all
partial functions on V1 that are solutions of pattern1.  Let
T = (T1, T2, ... Tm) be a sequence of all
partial functions on V2 that are solutions of pattern2.
Then a solution sequence of P is any permutation of the sequence
(S1, ..., Sn, T1, ..., Tm).

(Note: This definition involves a trick concerning partial
functions.  For example, each Si is a partial function on V1,
therefore it is a partial function on the set of all variables
in P that happens to be undefined on the variables that belong
only to V2.)

Fred
Received on Friday, 9 June 2006 05:40:11 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:00:51 UTC