formal semantics of OPT operator from Fred Zemke on 2006-06-23 (public-rdf-dawg@w3.org from April to June 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Fri, 23 Jun 2006 08:12:48 -0700
To: public-rdf-dawg@w3.org
Message-ID: <449C04F0.2070306@oracle.com>
Andy Seaborne in 
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0172.html
wrote concerning my paper sparql-semantics-draft.pdf attached to
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0170.html
the following:

3/ Section 3.18

The significant change here is where S is required not to be defined on the 
variables (variables introduced by ...) GGP.  This rules out the query:

"Get the name if there is one, else get the email address, else ..."

or
    { ... ?x .... }
    OPTIONAL { ?x foaf:name ?displayLabel }
    OPTIONAL { ?x foaf:mbox ?displayLabel }

because S may be defined on ?displayLabel yet not match (and hence not 
introduced new bindings for other variables) on the right hand side of the 
optional.  Have I understood correctly?

My reply:

Yes, you understand my proposal.  I was making it behave like a 
left outer join in SQL.  For the join criterion, I took it that 
we wanted to match all variables that are shared between the two
operands.  In that case, optional values can only be the variables 
that are found only in the second operand.

In SQL, what you are trying to do would be done by performing
two outer joins, obtaining the name optionally from one and 
the mailbox optionally from the other.  To reduce this to a single
value, the name if found, otherwise the mailbox, you would use a COALESCE
operation, which picks the first nonnull value, if any, from a list.

In the long run, I think SPARQL will want expression syntax, 
including conditionals like C's ? operator or SQL's CASE 
(from which COALESCE is defined as syntactic sugar).

Meanwhile, I tried redesigning my definition to accomodate the 
example with the semantics stated above.  Try the following 
definition:

Given patterns A and B, then S is a solution of OPT (A, B)
if one of the following is true:

1. S solves A and B, and there exists a restriction of S that 
solves A.

2. S solves A, S is undefined on the variables that are found 
in B but not in A, and there is no extension of S that solves 
A and B.

The difference from my prior definition is in item 1.
Formerly I proposed here "S solves A and B, and S restricted to the 
variables that appear in A solves A".  Thus the former definition 
looked at a particular restriction of S.  If there is a variable V
appearing in A and a solution of A in which V is not bound, then
that variable must remain unbound when extending to create a 
solution of B.  With the revised formulation, if there is a solution
with an unbound variable V appearing in A, then it is possible for
the extension that solves A and B to bind V.

I think this does what you want, but we need to look at some 
more test cases.  We should also look at examples using UNION
as the first operand, since that can also create solutions with
unbound variables.

Fred
Received on Friday, 23 June 2006 15:13:07 UTC