[All] SPARQL Query Language Review

Hi all,

I have an action item [1] to review and comment on the specification  
for the SPARQL Query Language for RDF [2].

I reviewed the 21 July 2005 Working Draft.

Regards,
Dave

[1] http://www.w3.org/2005/10/03-swbp-minutes.html#action17
[2] http://www.w3.org/TR/rdf-sparql-query/
-----------------------------% 
<--------------------------------------------

Review of SPARQL Query Language for RDF Working Draft

OVERVIEW:


LANGUAGE ISSUES:

1)  SPARQL would appear not to have a complete model theoretic base.   
Although sections of the specification are described using sets,  
nothing is presented which hangs all of the section together.  This  
is unfortunate and is, I think, the underlying cause of some of  
language features critiqued below.  (NB:  I tried to get Simon  
Raboczi to complete and submit his model theory unifying iTQL and  
SPARQL, but was unsuccessful in doing so.)

2)  The language is not built with extensibility in mind.  That is,  
is not, in my opinion, sufficiently functional.  There are several  
areas of functionality which we already know are of interest to users  
of RDF query languages (e.g. iTQL's 'walk' and 'trans' functions  
which perform generic graph walking and transitive closure,  
respectfully) and it is difficult to see how one might add these  
commands to a later SPARQL version without making wholesale changes  
to the language.

3)  The handling of blank nodes ("bnodes") is, again in my opinion,  
the single greatest failure of the specification.  We have to admit  
that RDF graphs contain bnodes and queries will run across them.  We  
also have to admit that a querier will (not 'may') often want to  
subsequently find information connected to those bnodes.  SPARQL's  
insistence that bnodes' true internal identities not be returned to a  
querier (correct in and of itself) combined with the lack of subquery  
capability ensures that many useful RDF queries routinely performed  
in other languages simply cannot be written in SPARQL.  OPTIONAL  
addresses only part of that functionality.

4)  SPARQL contains a large number of top-level commands.  This could  
be a result again of the lack of subqueries and an underlying model  
theory.  It is an unfortunate design choice.

5)  Good language design would dictate that logical opposites (e.g.  
conjunction and disjunction) be represented in syntactically similar  
ways, even if one is generally implicit.  Therefore, since UNION  
(disjunction) is present (as well it should be) along with  
conjunction, then conjunction should have an optional equivalent  
keyword even though it is implicit in most uses.

6)  I was initially concerned about the definitions of subjects as  
including literals in triple patterns, until Andy Seaborne pointed  
out the differences between a triple and a triple matching pattern.

7)  The nulls generated by UNION and the nulls generated by OPTIONAL  
may be distinct.  They correspond to logical true and false,  
respectively.  That makes life a bit difficult for implementors.  It  
may be that another form of 'null' should be considered.  Thanks to  
Simon Raboczi for this analysis.

8)  There does not seem to be any way to force a literal into the  
variable position in a binding.  That is very useful when attempting  
to create a result which must take a certain form (e.g. be a set of  
triples) and occasionally mandatory if the result set must be forced  
into triple form.


INTEROPERABILITY ISSUES:

1)  There is, to the best of my knowledge, no way for a SPARQL user  
to command the creation of an RDF container within a data store.  The  
lack of an explicit command will encourage implementors to create  
their own, thereby hurting interoperability.  Similarly for deletion  
requests.

2)  The form and content of DESCRIBE results are left to the data  
publisher.  It would seem that such an open-ended conversation would  
require a human consumer in the general case.  I am left wondering  
why the DESCRIBE functionality is not left to a more general SELECT  
query against a describing RDF container.


SCALABILITY CONCERNS:

1)  The evaluation of regular expressions after the binding of graph  
patterns rules out a lot of potential join optimizations.

2)  OPTIONAL, in its entirety.  The concern is that querying very  
large data sets using OPTIONAL would result in very large  
intermediate results requiring joining.  Subqueries effectively  
sidestep that problem by allowing further restrictions against a  
smaller result set.

3)  UNSAID would have been of great concern with regards to  
scalability, just in case it comes back :)


SPECIFICATION ISSUES:

1)  It would be really nice if the grammar was ordered for easier  
reading, perhaps alphabetically.

Received on Friday, 14 October 2005 18:31:22 UTC