RE: Applying the relational model to SPARQL

Quite a while ago, I argued against the notion of splitting a SPARQ
query into graph and filter components, but lost that battle.  One
of the major objections against that split was that the UNION
connective was unduly circumscribed in what could be expressed.
Since then, I have periodically scanned the "SPARQL Query Language for RDF"
document at  http://www.w3.org/TR/rdf-sparql-query/ to see if the
situation had improved.  When the change occurred (which it did), I
missed it.  Hence, I recently made an erroneous claim on 
public-rdf-dawg-comments@w3.org that SPARQL was (still) not 
completely expressive with respect to disjuction.

I'm still wondering if the change in SPARQL grammar that increased
its expressive power has been accompanied by the (I claim) necessary
philosophical shift to account for the difference.  The language
syntax has not been revised appropriately, which partially accounts
for my failure to detect the change.  Below, I comment on the syntax
as it relates to this shift:

RDF has been conceived as different from the predicate calculus in
that it deals with *graphs* rather than with arbitrary logical
expressions.  In the predicate calculus, one can make, for example,
non-graph-like disjunctive assertions; in RDF we can't.  

SPARQL was initially conceived, as nearly as I can tell, as a 
"graph query language".  The UNION operator was an "algebraic"
operation that took in two graphs and produced a third.  This is no
longer the case.  Here is a counter example that I executed on the
SPARQLer website at  http://www.sparql.org/query.html :

PREFIX books:   <http://example.org/book/>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
SELECT ?book ?title
WHERE 
  { ?book dc:title ?title .
    {
     FILTER (?title = "Harry Potter and the Half-Blood Prince")
    }
  UNION
    {
     FILTER (?title = "Harry Potter and the Order of the Phoenix")
    }
  }

This query unions two filter clauses, so clearly UNION is no longer
(just) a graph operator.

I was told (quite) a while back that UNION was not a disjunction
connective, i.e., it was not equivalent to the traditional "OR"
in the predicate calculus.  I didn't really understand the distinction
at the time, but later theorized that perhaps it was this the 
difference between a graph operator and an expression operator.
If so, that distinction is no longer valid.

The above query can also be expressed as follows:

PREFIX books:   <http://example.org/book/>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
SELECT ?book ?title
WHERE 
  { ?book dc:title ?title
    FILTER 
            (?title = "Harry Potter and the Half-Blood Prince" ||
             ?title = "Harry Potter and the Order of the Phoenix")           
  }

Once upon a time, the "||" operator was the only way to achieve
a disjunction within a FILTER expression.  Now, the "||" is
redundant; the language is equally expressive if we eliminate it
from the grammar.  The same is true for the "&&" operator.  The 
query

PREFIX books:   <http://example.org/book/>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
SELECT ?book ?title
WHERE 
  { ?book dc:title ?title
    FILTER (
            regex(?title, "Harry Potter") &&
            regex(?title, "Phoenix")
           )
  }

can also be expressed as

PREFIX books:   <http://example.org/book/>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
SELECT ?book ?title
WHERE 
  { ?book dc:title ?title .
    {
    FILTER regex(?title, "Harry Potter") 
    }
   .
    {
     FILTER regex(?title, "Phoenix")
    }
  }

I would claim that a well-designed language should not include
redundant operators, and that the primary justification for 
including "||" and "&&" is now historical.  I would also claim
that the original justification for choosing the term "UNION" in
preference to the term "OR" is now gone (but perhaps there is
a wrinkle that I'm not aware of?).  And if I were king, I would
use the term "AND" in preference to ".".

What we have in SPARQL is a language that has evolved, but the
syntax has not uniformly evolved with it.

The term "FILTER" is also quite unfortunate.  I have been told
that there has been a deliberate decision to forbid a syntax
that allows query variables to be bound to literal values that do not
appear within the underlying RDF store.  Our own query processor does
not have that restriction, and we have any number of use cases in our
own applications that depend on the ability to generate synthetic
literals.  The SPARQL 'funcall' invokes a predicate that returns
a boolean value.  Our equivalent of funcall can return arbitrary
values.  In our library of ~40 system-defined funcall IRIs, the functions
outnumber the predicates by more than 3-to-1.

The notion of "filter" is in fact too limiting; the additional
power that accompanies the ability to synthesize new literal values
(and allowing them to be bound to SELECT variables) is huge, and
languages that go beyond the filter notion are going to dominate
those that don't.

Summarizing.  The notion of SPARQL as a graph language, rather than
as a calculus language, is already obsolete.  The syntax of the language 
has not been upgraded to accomodate that conceptual shift.  Instead, 
there is an artificial syntactic barrier between the 
"graph" portion of the language and the non-graph portion.  SQL and
Common Logic provide examples of logic languages that did not make
that distinction, and as a result are much cleaner, and much more
readable.

Cheers, Bob

----Original Message-----
From: public-rdf-dawg-comments-request@w3.org [mailto:public-rdf-dawg-comments-request@w3.org] On Behalf Of Eric Prud'hommeaux
Sent: Thursday, January 04, 2007 09:00
To: Bob MacGregor
Cc: Andrew Newman; public-rdf-dawg-comments@w3.org
Subject: Re: Applying the relational model to SPARQL

* Bob MacGregor <bmacgregor@siderean.com> [2006-11-10 08:55-0800]
> That brings us to SPARQL.  SPARQL is a major disappointment.  The most 
> grievous error is the distinction between the WHERE and FILTER clauses.
> The faceted navigation product that my company sells generates RDF 
> queries that cannot be expressed in SPARQL because they frequently use 
> an OR connective that includes both statements and filters within the 
> disjuncts.  In otherwords, the queries we routinely execute cannot be 
> processed by a SPARQL query engine.

Could you outline such a query in the syntax of your choice? I may require an english explanation of the query in order to understand it.

--
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than email address distribution.

Received on Friday, 19 January 2007 21:20:46 UTC