- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Fri, 13 Oct 2006 14:41:44 -0700
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Bijan Parsia wrote: > This may correspond with some of Fred's issues and I apologize for > not reconciling with his texts and email. No problem, I have trouble keeping up with all the email myself. I am not replicating Bijan's message so that I can get to what his message means to me. His message mentions various operators, with uppercase names, among which is AND. I think that AND is an unfortunate name for this operator. Bijan also uses the term "join" in lowercase, which I think is preferable. When I first encountered SPARQL, I too thought of the connective between elements of a group graph pattern (rule [19] GroupGraphPattern) as AND. This impression was reinforced by the definition in 6.1 "Group graph pattern" which defines solutions of a group graph pattern as universal quantification over the constituent patterns. Universal quantification of a finite list of predicates is the same thing as AND of that list of predicates. But I now believe that that definition is flawed because it does not consider the domains of solutions, which hides the fact that it is actually a join operation. Instead of saying "S is a solution of a group graph pattern GGP if for every element GPi of GGP, S is a solution of GPi" it would be more accurate to say "S is a solution of GGP if the domain of S is a subset of the variables appearing in GGP and for every i, some restriction of S is a solution of GPi" or (closer to the Chilean approach) "S is a solution of GGP if for every i, there exists a solution Ti of GPi, such that S is the join of the Ti." Now let us look at the specific query examples in Bijan's email. I will state them in SPARQL syntax, changing the variables to capitals (for convenience in the SQL that I will develop later) and shortening the IRIs: Q1: WHERE { ?X :n :p . ?Y :n :g OPTIONAL { ?X :c ?Z } } Q2: WHERE { ?Y :n :g OPTIONAL { ?X :e ?Z } ?X :n :p } These involve three triple patterns: T1: ?X :n :p T2: ?Y :n :g T3: ?X :c ?Z If we think of the connective in the group graph pattern as AND, then one might expect that Q1 is equivalent to Q2. All the user has done is permute T1 from the front of the query to the back. However, if we think of the connective as JOIN, then the equivalence falls apart, at least from my experience in SQL. SQL users do not expect to be able to permute JOIN and LEFT OUTER JOIN as shown between Q1 and Q2, because LEFT OUTER JOIN is known not to have such behavior. This point is seen by translating Q1 and Q2 into SQL. I will use a simplification of my translation algorithm attached to http://lists.w3.org/Archives/Public/public-rdf-dawg/2006OctDec/0058.html Let Graph be the table containing the default graph, with columns called Subject, Verb and Object. The three triples can be expressed by WITH T1 AS ( SELECT Subject AS X FROM Graph WHERE Verb = ':n' AND Object = ':p' ), T2 AS ( SELECT Subject AS Y FROM Graph WHERE Verb = ':n' AND Object = ':g' ), T3 AS ( SELECT Subject AS X, Object AS Z FROM Graph WHERE Verb = ':c' ) then the translation of Q1 will finish as follows: SELECT T1.X, T2.Y, T3.Z FROM (T1 JOIN T2 ON 1=1) LEFT OUTER JOIN T3 ON (T1.X = T3.X) and the translation of Q2 will finish with: SELECT T1.Y, T2.X, T2.Z FROM (T2 LEFT OUTER JOIN T3 ON (1=1)) JOIN T1 ON (T1.X = T3.X) and nobody expects the algebra of JOIN and LEFT OUTER JOIN to make such an equivalentce So, as I said, changing our operator name from AND to JOIN is a step in the right direction. That will bring us some clarity, but what about our users? Let's look at the queries again: Q1: WHERE { ?X :n :p . ?Y :n :g OPTIONAL { ?X :c ?Z } } Q2: WHERE { ?Y :n :g OPTIONAL { ?X :e ?Z } ?X :n :p } SPARQL queries have the advantage over SQL in compactness, but I think the operator denoted by dot (.) or concatenation of patterns will mislead our users when they throw OPTIONAL into the mix. Look at the efforts I have expended in trying to specify what the first operand of OPTIONAL is. Andy Seaborne's recent grammar changes http://lists.w3.org/Archives/Public/public-rdf-dawg/2006OctDec/0055.html are a step forward, but I think the real solution is to make OPTIONAL into a bona fide binary operator, with a production such as OptionalPattern ::= GroupGraphPattern OPTIONAL GroupGraphPattern The existing syntax as a postfix operator whose first argument must be inferred just does not provide the clarity of a true binary operator in the BNF. If we did this, then the two queries would be written Q1: WHERE { { ?X :n :p . ?Y :n :g } OPTIONAL { ?X :c ?Z } } Q2: WHERE { { ?Y :n :g } OPTIONAL { ?X :e ?Z } ?X :n :p } The revised SPARQL syntax clearly indicates the desired join order, and I don't think our users will think that they can move T1 out of the first operand in Q1 as shown in Q2. I would welcome comments from the Chileans, since their work sparked this thread. While I am bitching about the SPARQL syntax, I will add that the SELECT ... FROM ... WHERE ... syntax, borrowed from SQL, is also misleading. In SQL, rows are built in the FROM clause and pruned in the WHERE clause. In SPARQL, solutions are built in the WHERE clause, and pruned in the FILTER clause. I think this difference will be a source of ongoing confusion as users, many of whom will have an SQL background, encounter SPARQL. I would be happy if we changed the keywords, especially WHERE, to something else. Maybe change WHERE to MATCH. Fred
Received on Friday, 13 October 2006 21:43:15 UTC