Re: [All] SPARQL Query Language Review from David Wood on 2005-10-19 (public-swbp-wg@w3.org from October 2005)

From: David Wood <dwood@softwarememetics.com>
Date: Wed, 19 Oct 2005 10:09:55 -0400
To: public-swbp-wg@w3.org
Message-Id: <2B962385-8071-4A1F-8DA7-E1672C43B962@softwarememetics.com>
An interesting review of SPARQL Query Language just popped up on the  
Kowari developers' mailing list (see [1]).  Andrae Muys was the  
author.  It is worth a read.

[1] https://sourceforge.net/mailarchive/message.php?msg_id=13496030

Regards,
Dave


On Oct 14, 2005, at 14:31, David Wood wrote:

>
> Hi all,
>
> I have an action item [1] to review and comment on the  
> specification for the SPARQL Query Language for RDF [2].
>
> I reviewed the 21 July 2005 Working Draft.
>
> Regards,
> Dave
>
> [1] http://www.w3.org/2005/10/03-swbp-minutes.html#action17
> [2] http://www.w3.org/TR/rdf-sparql-query/
> -----------------------------% 
> <--------------------------------------------
>
> Review of SPARQL Query Language for RDF Working Draft
>
> OVERVIEW:
>
>
> LANGUAGE ISSUES:
>
> 1)  SPARQL would appear not to have a complete model theoretic  
> base.  Although sections of the specification are described using  
> sets, nothing is presented which hangs all of the section  
> together.  This is unfortunate and is, I think, the underlying  
> cause of some of language features critiqued below.  (NB:  I tried  
> to get Simon Raboczi to complete and submit his model theory  
> unifying iTQL and SPARQL, but was unsuccessful in doing so.)
>
> 2)  The language is not built with extensibility in mind.  That is,  
> is not, in my opinion, sufficiently functional.  There are several  
> areas of functionality which we already know are of interest to  
> users of RDF query languages (e.g. iTQL's 'walk' and 'trans'  
> functions which perform generic graph walking and transitive  
> closure, respectfully) and it is difficult to see how one might add  
> these commands to a later SPARQL version without making wholesale  
> changes to the language.
>
> 3)  The handling of blank nodes ("bnodes") is, again in my opinion,  
> the single greatest failure of the specification.  We have to admit  
> that RDF graphs contain bnodes and queries will run across them.   
> We also have to admit that a querier will (not 'may') often want to  
> subsequently find information connected to those bnodes.  SPARQL's  
> insistence that bnodes' true internal identities not be returned to  
> a querier (correct in and of itself) combined with the lack of  
> subquery capability ensures that many useful RDF queries routinely  
> performed in other languages simply cannot be written in SPARQL.   
> OPTIONAL addresses only part of that functionality.
>
> 4)  SPARQL contains a large number of top-level commands.  This  
> could be a result again of the lack of subqueries and an underlying  
> model theory.  It is an unfortunate design choice.
>
> 5)  Good language design would dictate that logical opposites (e.g.  
> conjunction and disjunction) be represented in syntactically  
> similar ways, even if one is generally implicit.  Therefore, since  
> UNION (disjunction) is present (as well it should be) along with  
> conjunction, then conjunction should have an optional equivalent  
> keyword even though it is implicit in most uses.
>
> 6)  I was initially concerned about the definitions of subjects as  
> including literals in triple patterns, until Andy Seaborne pointed  
> out the differences between a triple and a triple matching pattern.
>
> 7)  The nulls generated by UNION and the nulls generated by  
> OPTIONAL may be distinct.  They correspond to logical true and  
> false, respectively.  That makes life a bit difficult for  
> implementors.  It may be that another form of 'null' should be  
> considered.  Thanks to Simon Raboczi for this analysis.
>
> 8)  There does not seem to be any way to force a literal into the  
> variable position in a binding.  That is very useful when  
> attempting to create a result which must take a certain form (e.g.  
> be a set of triples) and occasionally mandatory if the result set  
> must be forced into triple form.
>
>
> INTEROPERABILITY ISSUES:
>
> 1)  There is, to the best of my knowledge, no way for a SPARQL user  
> to command the creation of an RDF container within a data store.   
> The lack of an explicit command will encourage implementors to  
> create their own, thereby hurting interoperability.  Similarly for  
> deletion requests.
>
> 2)  The form and content of DESCRIBE results are left to the data  
> publisher.  It would seem that such an open-ended conversation  
> would require a human consumer in the general case.  I am left  
> wondering why the DESCRIBE functionality is not left to a more  
> general SELECT query against a describing RDF container.
>
>
> SCALABILITY CONCERNS:
>
> 1)  The evaluation of regular expressions after the binding of  
> graph patterns rules out a lot of potential join optimizations.
>
> 2)  OPTIONAL, in its entirety.  The concern is that querying very  
> large data sets using OPTIONAL would result in very large  
> intermediate results requiring joining.  Subqueries effectively  
> sidestep that problem by allowing further restrictions against a  
> smaller result set.
>
> 3)  UNSAID would have been of great concern with regards to  
> scalability, just in case it comes back :)
>
>
> SPECIFICATION ISSUES:
>
> 1)  It would be really nice if the grammar was ordered for easier  
> reading, perhaps alphabetically.
>
>
>
>
Received on Wednesday, 19 October 2005 14:10:18 UTC