Re: [All] SPARQL Query Language Review from Enrico Franconi on 2005-10-20 (public-swbp-wg@w3.org from October 2005)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Thu, 20 Oct 2005 10:04:17 +0200
To: David Wood <dwood@softwarememetics.com>
Cc: public-swbp-wg@w3.org
Message-Id: <2AF75BDF-124C-4605-B99A-0F0D235DBA76@inf.unibz.it>
On 14 Oct 2005, at 20:31, David Wood wrote:
> 1)  SPARQL would appear not to have a complete model theoretic  
> base.  Although sections of the specification are described using  
> sets, nothing is presented which hangs all of the section  
> together.  This is unfortunate and is, I think, the underlying  
> cause of some of language features critiqued below.  (NB:  I tried  
> to get Simon Raboczi to complete and submit his model theory  
> unifying iTQL and SPARQL, but was unsuccessful in doing so.)

The WG is working on that. We have a real MT for SPARQL.

> 2)  The language is not built with extensibility in mind.  That is,  
> is not, in my opinion, sufficiently functional.  There are several  
> areas of functionality which we already know are of interest to  
> users of RDF query languages (e.g. iTQL's 'walk' and 'trans'  
> functions which perform generic graph walking and transitive  
> closure, respectfully) and it is difficult to see how one might add  
> these commands to a later SPARQL version without making wholesale  
> changes to the language.

I believe that once SPARQL will have a serious compositional MT,  
extensibility will come for free.

> 3)  The handling of blank nodes ("bnodes") is, again in my opinion,  
> the single greatest failure of the specification.  We have to admit  
> that RDF graphs contain bnodes and queries will run across them.   
> We also have to admit that a querier will (not 'may') often want to  
> subsequently find information connected to those bnodes.  SPARQL's  
> insistence that bnodes' true internal identities not be returned to  
> a querier (correct in and of itself) combined with the lack of  
> subquery capability ensures that many useful RDF queries routinely  
> performed in other languages simply cannot be written in SPARQL.   
> OPTIONAL addresses only part of that functionality.

Again, I believe that once SPARQL will have a serious compositional  
MT, these issues will be solved. Btw, bnodes are the only new feature  
of SPARQL wrt standard query languages.

> 4)  SPARQL contains a large number of top-level commands.  This  
> could be a result again of the lack of subqueries and an underlying  
> model theory.  It is an unfortunate design choice.

See above.

> 5)  Good language design would dictate that logical opposites (e.g.  
> conjunction and disjunction) be represented in syntactically  
> similar ways, even if one is generally implicit.  Therefore, since  
> UNION (disjunction) is present (as well it should be) along with  
> conjunction, then conjunction should have an optional equivalent  
> keyword even though it is implicit in most uses.

The 'join' (i.e., the 'dot') is the conjunction, exactly like in SQL.

> 7)  The nulls generated by UNION and the nulls generated by  
> OPTIONAL may be distinct.  They correspond to logical true and  
> false, respectively.  That makes life a bit difficult for  
> implementors.  It may be that another form of 'null' should be  
> considered.  Thanks to Simon Raboczi for this analysis.

This was pointed out in the WG as well some time ago. The new MT  
fixes that.

> 8)  There does not seem to be any way to force a literal into the  
> variable position in a binding.  That is very useful when  
> attempting to create a result which must take a certain form (e.g.  
> be a set of triples) and occasionally mandatory if the result set  
> must be forced into triple form.

I'm not sure I understand you well, but in the CONSTRUCT you can have  
of course literals.

> 2)  The form and content of DESCRIBE results are left to the data  
> publisher.  It would seem that such an open-ended conversation  
> would require a human consumer in the general case.  I am left  
> wondering why the DESCRIBE functionality is not left to a more  
> general SELECT query against a describing RDF container.

I agree on that.

> SCALABILITY CONCERNS:
>
> 1)  The evaluation of regular expressions after the binding of  
> graph patterns rules out a lot of potential join optimizations.

The semantics does not say that you have to implement it *after*; it  
is just a definition. As soon as they comply to the overall  
semantics, any optimisation is fine.

> 2)  OPTIONAL, in its entirety.  The concern is that querying very  
> large data sets using OPTIONAL would result in very large  
> intermediate results requiring joining.  Subqueries effectively  
> sidestep that problem by allowing further restrictions against a  
> smaller result set.

Like any algebraic operator with a well defined semantics, OPTIONAL  
can be implemented in different ways, leaving all the space you need  
for optimisations.

cheers
--e.

Enrico Franconi                  - franconi@inf.unibz.it
Free University of Bozen-Bolzano - http://www.inf.unibz.it/~franconi/
Faculty of Computer Science      - Phone: (+39) 0471-016-120
I-39100 Bozen-Bolzano BZ, Italy  - Fax:   (+39) 0471-016-129
Received on Thursday, 20 October 2005 08:04:23 UTC