W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > February 2007

RE: Applying the relational model to SPARQL

From: Bob MacGregor <bmacgregor@siderean.com>
Date: Tue, 6 Feb 2007 19:06:36 -0800
To: "Seaborne, Andy" <andy.seaborne@hp.com>
CC: "Eric Prud'hommeaux" <eric@w3.org> , "Andrew Newman" <andrewfnewman@gmail.com> , "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>
Message-ID: <20070206190636834.00000003896@bmacgregor1>

Andy,

Thanks for a detailed response.  I am not sure that I understand the motivation for focusing on an algebra, but let me guess, and then comment.  

It appears to me that you (the committee) is designing SPARQL semantics from an operational standpoint rather than from a declarative standpoint.  In other words, you are designing an (algebraic) query engine, and then retrofitting the language to mean whatever the query engine does.  I hope I'm wrong in this interpretation, but that is my best guess based on comments below and elsewhere.  

The comments below, indicating that very different things happen depending on exactly how curlies are wrapped, or when you substitute a union for an ||, suggest that the language is not at all user friendly, and seems to be unfriendly by design rather than by accident.  Contrast that with SQL, where they designed a language, ran it by user groups to test for ease of use, and then figured out how to implement the language.  In fact, it was several years before they were convinced that efficient implementations of SQL were indeed feasible.

With the exception of outer joins, the SQL language is fully declarative (to the best of my knowledge) AND it has an algebraic implementation.  Meaning that its quite possible to have a declarative language and still use an algebraic query processor, if the latter is dear to your heart.  So, I'm not sure if your motivation for bending the language around the algebra is because the committee can't figure out how to achieve a declarative semantics, or can't figure out how to implement it once it has one, or what.  But whatever the reason, the current result is quite unfortunate.  No reasonable query language has two AND operators and two OR operators.  The addition of delimiters (curlies) as in your example below should NOT have an effect on the semantics if the interpretation without them is unambiguous.

It would be helpful (to me at least) if the SPARQL documents include explicit mention of the *assumptions* that underlie the design of the language.  If I knew what they were, then for example, I could guess as to whether SPARQL might someday allow variables to bind to values not found in the database (a feature we rely on quite heavily with our own RDF query language) or if the committee is deliberately designing itself into a corner.  

Again, I'm hoping that I'm totally mistaken here in my interpretation of SPARQL semantics.

Regards, Bob

-----Original Message-----
From: Seaborne, Andy [mailto:andy.seaborne@hp.com] 
Sent: Friday, February 02, 2007 11:45
To: Bob MacGregor
Cc: Eric Prud'hommeaux; Andrew Newman; public-rdf-dawg-comments@w3.org
Subject: Re: Applying the relational model to SPARQL

Bob,

Thank you for the response.

There are some changes coming up that the working group is considering that will affect some of your examples although not necessarily the main line of your argument.

The change is to make the combination of graph patterns based on an algebra [*] with bottom-up construction of result set, rather than a declarative restriction on overall results.

An expression like
{ ?book dc:title ?title .
   FILTER (?title = "Harry Potter and the Half-Blood Prince") }

is evaluated as a restriction over the graph patterns in the group between {} (here, a simple basic graph pattern ?book dc:title ?title).

Then the results is these expression are combined with other query.

http://www.sparql.org/query.html does not yet provide these (draft,
unapproved) semantics.  There is a small scale implementation designed to directly implement the SPARQL algebra in ARQ, but it is not the default query engine.

Current details:
http://www.w3.org/2001/sw/DataAccess/rq23/rq25-algebra.html
although this text is due to be merged into the overall document http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html

Looking at the translation from a SPARQL query string to an algebra expression:

----------
PREFIX books:   <http://example.org/book/>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
SELECT ?book ?title
WHERE
   { ?book dc:title ?title .
      FILTER (?title = "Harry Potter and the Half-Blood Prince")
   }
----------
becomes:

[OpProject ?book ?title
   [OpFilter ( ?title = "Harry Potter and the Half-Blood Prince" )
     [OpBGP ?book dc:title ?title]]]

where the filter restricts the basic graph pattern.

but writing:
{ FILTER (?title = "Harry Potter and the Half-Blood Prince") } is a restriction of the empty pattern, i.e. it evaluates to no solutions.  the {} make a difference in the algebra.

----------
PREFIX books:   <http://example.org/book/>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
SELECT ?book ?title
WHERE
   { ?book dc:title ?title .
     {
      FILTER (?title = "Harry Potter and the Half-Blood Prince")
     }
   }
-------
becomes:

[OpProject ?book ?title
   [OpJoin
     [OpBGP ?book dc:title ?title]
     [OpFilter ( ?title = "Harry Potter and the Half-Blood Prince" )
       (unit table)]
   ]]

the filter, enclosed in {} is evaluated in the context of the {} (the unit table - no bindings - the empty pattern) and so evaluates to false (?title not
bound))

Your combining FILTERs to get disjunction would need to be done in the "||" form:
    FILTER
        (?title = "Harry Potter and the Half-Blood Prince" ||
         ?title = "Harry Potter and the Order of the Phoenix")


	Andy


----------------

[*]
A few references for work in this area, as well as the work done within the working group:

A relational algebra for SPARQL
http://www.hpl.hp.com/techreports/2005/HPL-2005-170.html
Richarh Cyganiak

Semantics and Complexity of SPARQL
   http://arxiv.org/abs/cs.DB/0605124
Jorge Perez, Marcelo Arenas, Claudio Gutierrez

Semantics of SPARQL
   http://ing.utalca.cl/%7Ejperez/papers/sparql_semantics.pdf
Jorge Perez, Marcelo Arenas, Claudio Gutierrez

Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns
   http://www.cs.wayne.edu/%7Eartem/main/research/TR-DB-052006-CLJF.pdf
Artem Chebotko, Shiyong Lu, Hasan M. Jamil and Farshad Fotouhi


Bob MacGregor wrote:
 >
 >
 > Quite a while ago, I argued against the notion of splitting a SPARQ  > query into graph and filter components, but lost that battle.  One  > of the major objections against that split was that the UNION  > connective was unduly circumscribed in what could be expressed.
 > Since then, I have periodically scanned the "SPARQL Query Language for RDF"
 > document at  http://www.w3.org/TR/rdf-sparql-query/ to see if the  > situation had improved.  When the change occurred (which it did), I  > missed it.  Hence, I recently made an erroneous claim on  > public-rdf-dawg-comments@w3.org that SPARQL was (still) not  > completely expressive with respect to disjuction.
 >
 > I'm still wondering if the change in SPARQL grammar that increased  > its expressive power has been accompanied by the (I claim) necessary  > philosophical shift to account for the difference.  The language  > syntax has not been revised appropriately, which partially accounts  > for my failure to detect the change.  Below, I comment on the syntax  > as it relates to this shift:
 >
 > RDF has been conceived as different from the predicate calculus in  > that it deals with *graphs* rather than with arbitrary logical  > expressions.  In the predicate calculus, one can make, for example,  > non-graph-like disjunctive assertions; in RDF we can't.
 >
 > SPARQL was initially conceived, as nearly as I can tell, as a  > "graph query language".  The UNION operator was an "algebraic"
 > operation that took in two graphs and produced a third.  This is no  > longer the case.  Here is a counter example that I executed on the  > SPARQLer website at  http://www.sparql.org/query.html :
 >
 > PREFIX books:   <http://example.org/book/>
 > PREFIX dc:      <http://purl.org/dc/elements/1.1/>
 > SELECT ?book ?title
 > WHERE
 >   { ?book dc:title ?title .
 >     {
 >      FILTER (?title = "Harry Potter and the Half-Blood Prince")
 >     }
 >   UNION
 >     {
 >      FILTER (?title = "Harry Potter and the Order of the Phoenix")
 >     }
 >   }
 >
 > This query unions two filter clauses, so clearly UNION is no longer  > (just) a graph operator.
 >
 > I was told (quite) a while back that UNION was not a disjunction  > connective, i.e., it was not equivalent to the traditional "OR"
 > in the predicate calculus.  I didn't really understand the distinction  > at the time, but later theorized that perhaps it was this the  > difference between a graph operator and an expression operator.
 > If so, that distinction is no longer valid.
 >
 > The above query can also be expressed as follows:
 >
 > PREFIX books:   <http://example.org/book/>
 > PREFIX dc:      <http://purl.org/dc/elements/1.1/>
 > SELECT ?book ?title
 > WHERE
 >   { ?book dc:title ?title
 >     FILTER
 >             (?title = "Harry Potter and the Half-Blood Prince" ||
 >              ?title = "Harry Potter and the Order of the Phoenix")
 >   }
 >
 > Once upon a time, the "||" operator was the only way to achieve  > a disjunction within a FILTER expression.  Now, the "||" is  > redundant; the language is equally expressive if we eliminate it  > from the grammar.  The same is true for the "&&" operator.  The  > query  >
 > PREFIX books:   <http://example.org/book/>
 > PREFIX dc:      <http://purl.org/dc/elements/1.1/>
 > SELECT ?book ?title
 > WHERE
 >   { ?book dc:title ?title
 >     FILTER (
 >             regex(?title, "Harry Potter") &&
 >             regex(?title, "Phoenix")
 >            )
 >   }
 >
 > can also be expressed as
 >
 > PREFIX books:   <http://example.org/book/>
 > PREFIX dc:      <http://purl.org/dc/elements/1.1/>
 > SELECT ?book ?title
 > WHERE
 >   { ?book dc:title ?title .
 >     {
 >     FILTER regex(?title, "Harry Potter")
 >     }
 >    .
 >     {
 >      FILTER regex(?title, "Phoenix")
 >     }
 >   }
 >
 > I would claim that a well-designed language should not include  > redundant operators, and that the primary justification for  > including "||" and "&&" is now historical.  I would also claim  > that the original justification for choosing the term "UNION" in  > preference to the term "OR" is now gone (but perhaps there is  > a wrinkle that I'm not aware of?).  And if I were king, I would  > use the term "AND" in preference to ".".
 >
 > What we have in SPARQL is a language that has evolved, but the  > syntax has not uniformly evolved with it.
 >
 > The term "FILTER" is also quite unfortunate.  I have been told  > that there has been a deliberate decision to forbid a syntax  > that allows query variables to be bound to literal values that do not  > appear within the underlying RDF store.  Our own query processor does  > not have that restriction, and we have any number of use cases in our  > own applications that depend on the ability to generate synthetic  > literals.  The SPARQL 'funcall' invokes a predicate that returns  > a boolean value.  Our equivalent of funcall can return arbitrary  > values.  In our library of ~40 system-defined funcall IRIs, the functions  > outnumber the predicates by more than 3-to-1.
 >
 > The notion of "filter" is in fact too limiting; the additional  > power that accompanies the ability to synthesize new literal values  > (and allowing them to be bound to SELECT variables) is huge, and  > languages that go beyond the filter notion are going to dominate  > those that don't.
 >
 > Summarizing.  The notion of SPARQL as a graph language, rather than  > as a calculus language, is already obsolete.  The syntax of the language  > has not been upgraded to accomodate that conceptual shift.  Instead,  > there is an artificial syntactic barrier between the  > "graph" portion of the language and the non-graph portion.  SQL and  > Common Logic provide examples of logic languages that did not make  > that distinction, and as a result are much cleaner, and much more  > readable.
 >
 > Cheers, Bob
Received on Wednesday, 7 February 2007 03:12:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:14:51 GMT