Re: Issues with evaluating optional: Commutativity of AND from Fred Zemke on 2006-10-13 (public-rdf-dawg@w3.org from October to December 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Fri, 13 Oct 2006 14:41:44 -0700
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <45300818.6010206@oracle.com>
Bijan Parsia wrote:

> This may correspond with some of Fred's issues and I apologize for  
> not reconciling with his texts and email.

No problem, I have trouble keeping up with all the email myself.

I am not replicating Bijan's message so that I can get to what
his message means to me.

His message mentions various operators, with uppercase names,
among which is AND.  I think that AND is an unfortunate name for
this operator.  Bijan also uses the term "join" in lowercase, which
I think is preferable.

When I first encountered SPARQL, I too thought of the
connective between elements of a group graph pattern
(rule [19] GroupGraphPattern) as AND.  This impression was
reinforced by the definition in 6.1 "Group graph pattern"
which defines solutions of a group graph pattern as
universal quantification over the constituent patterns.
Universal quantification of a finite list of predicates is the
same thing as AND of that list of predicates.  But I now
believe that that definition is flawed because it does not
consider the domains of solutions, which hides the fact that
it is actually a join operation.  Instead of saying

"S is a solution of a group graph pattern GGP if for every
element GPi of GGP, S is a solution of GPi"

it would be more accurate to say

"S is a solution of GGP if the domain of S is a subset of
the variables appearing in GGP and for every i, some
restriction of S is a solution of GPi"

or (closer to the Chilean approach)

"S is a solution of GGP if for every i, there exists a
solution Ti of GPi, such that S is the join of the Ti."

Now let us look at the specific query examples in Bijan's email.
I will state them in SPARQL syntax, changing the variables to
capitals (for convenience in the SQL that I will develop later)
and shortening the IRIs:

Q1: WHERE { ?X :n :p . ?Y :n :g OPTIONAL { ?X :c ?Z } }
Q2: WHERE { ?Y :n :g OPTIONAL { ?X :e ?Z } ?X :n :p }

These involve three triple patterns:

T1: ?X :n :p
T2: ?Y :n :g
T3: ?X :c ?Z

If we think of the connective in the group graph pattern as AND,
then one might expect that Q1 is equivalent to Q2.  All the user
has done is permute T1 from the front of the query to the back.
However, if we think of the connective
as JOIN, then the equivalence falls apart, at least from my
experience in SQL.  SQL users do not expect to be able to permute
JOIN and LEFT OUTER JOIN as shown between Q1 and Q2, because
LEFT OUTER JOIN is known not to have such behavior.

This point is seen by translating Q1 and Q2 into SQL.
I will use a simplification of my translation algorithm attached to
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006OctDec/0058.html
Let Graph be the table containing the default graph, with columns
called Subject, Verb and Object.  The three triples can be
expressed by

WITH T1 AS ( SELECT Subject AS X
             FROM Graph
             WHERE Verb = ':n' AND Object = ':p' ),
     T2 AS ( SELECT Subject AS Y
             FROM Graph
             WHERE Verb = ':n' AND Object = ':g' ),
     T3 AS ( SELECT Subject AS X, Object AS Z
             FROM Graph
             WHERE Verb = ':c' )

then the translation of Q1 will finish as follows:

SELECT T1.X, T2.Y, T3.Z
FROM (T1 JOIN T2 ON 1=1) LEFT OUTER JOIN T3 ON (T1.X = T3.X)

and the translation of Q2 will finish with:

SELECT T1.Y, T2.X, T2.Z
FROM (T2 LEFT OUTER JOIN T3 ON (1=1)) JOIN T1 ON (T1.X = T3.X)

and nobody expects the algebra of JOIN and LEFT OUTER JOIN
to make such an equivalentce

So, as I said, changing our operator name from AND to JOIN
is a step in the right direction.  That will bring us some
clarity, but what about our users?  Let's look at the
queries again:

Q1: WHERE { ?X :n :p . ?Y :n :g OPTIONAL { ?X :c ?Z } }
Q2: WHERE { ?Y :n :g OPTIONAL { ?X :e ?Z } ?X :n :p }

SPARQL queries have the advantage over SQL in compactness,
but I think the operator denoted by dot (.) or concatenation
of patterns will mislead our users when they throw OPTIONAL
into the mix.  Look at the efforts I have expended in trying
to specify what the first operand of OPTIONAL is. 

Andy Seaborne's recent grammar changes
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006OctDec/0055.html
are a step forward, but I think the real solution is to
make OPTIONAL into a bona fide binary operator, with a production
such as

OptionalPattern ::= GroupGraphPattern OPTIONAL GroupGraphPattern

The existing syntax as a postfix operator whose first argument
must be inferred just does not provide the clarity of a true
binary operator in the BNF.

If we did this, then the two queries would be written

Q1: WHERE { { ?X :n :p . ?Y :n :g } OPTIONAL { ?X :c ?Z } }
Q2: WHERE { { ?Y :n :g } OPTIONAL { ?X :e ?Z } ?X :n :p }

The revised SPARQL syntax clearly indicates the desired join order,
and I don't think our users will think that they can move
T1 out of the first operand in Q1 as shown in Q2.

I would welcome comments from the Chileans, since their
work sparked this thread.

While I am bitching about the SPARQL syntax, I will add
that the SELECT ... FROM ... WHERE ... syntax, borrowed
from SQL, is also misleading.  In SQL, rows are built
in the FROM clause and pruned in the WHERE clause.  In SPARQL,
solutions are built in the WHERE clause, and pruned
in the FILTER clause.  I think this difference will be a source
of ongoing confusion as users, many of whom will have an SQL
background, encounter SPARQL.  I would be happy if we changed
the keywords, especially WHERE, to something else. 
Maybe change WHERE to MATCH.

Fred
Received on Friday, 13 October 2006 21:43:15 UTC