- From: David Wood <dwood@softwarememetics.com>
- Date: Fri, 14 Oct 2005 14:31:13 -0400
- To: public-swbp-wg@w3.org
Hi all, I have an action item [1] to review and comment on the specification for the SPARQL Query Language for RDF [2]. I reviewed the 21 July 2005 Working Draft. Regards, Dave [1] http://www.w3.org/2005/10/03-swbp-minutes.html#action17 [2] http://www.w3.org/TR/rdf-sparql-query/ -----------------------------% <-------------------------------------------- Review of SPARQL Query Language for RDF Working Draft OVERVIEW: LANGUAGE ISSUES: 1) SPARQL would appear not to have a complete model theoretic base. Although sections of the specification are described using sets, nothing is presented which hangs all of the section together. This is unfortunate and is, I think, the underlying cause of some of language features critiqued below. (NB: I tried to get Simon Raboczi to complete and submit his model theory unifying iTQL and SPARQL, but was unsuccessful in doing so.) 2) The language is not built with extensibility in mind. That is, is not, in my opinion, sufficiently functional. There are several areas of functionality which we already know are of interest to users of RDF query languages (e.g. iTQL's 'walk' and 'trans' functions which perform generic graph walking and transitive closure, respectfully) and it is difficult to see how one might add these commands to a later SPARQL version without making wholesale changes to the language. 3) The handling of blank nodes ("bnodes") is, again in my opinion, the single greatest failure of the specification. We have to admit that RDF graphs contain bnodes and queries will run across them. We also have to admit that a querier will (not 'may') often want to subsequently find information connected to those bnodes. SPARQL's insistence that bnodes' true internal identities not be returned to a querier (correct in and of itself) combined with the lack of subquery capability ensures that many useful RDF queries routinely performed in other languages simply cannot be written in SPARQL. OPTIONAL addresses only part of that functionality. 4) SPARQL contains a large number of top-level commands. This could be a result again of the lack of subqueries and an underlying model theory. It is an unfortunate design choice. 5) Good language design would dictate that logical opposites (e.g. conjunction and disjunction) be represented in syntactically similar ways, even if one is generally implicit. Therefore, since UNION (disjunction) is present (as well it should be) along with conjunction, then conjunction should have an optional equivalent keyword even though it is implicit in most uses. 6) I was initially concerned about the definitions of subjects as including literals in triple patterns, until Andy Seaborne pointed out the differences between a triple and a triple matching pattern. 7) The nulls generated by UNION and the nulls generated by OPTIONAL may be distinct. They correspond to logical true and false, respectively. That makes life a bit difficult for implementors. It may be that another form of 'null' should be considered. Thanks to Simon Raboczi for this analysis. 8) There does not seem to be any way to force a literal into the variable position in a binding. That is very useful when attempting to create a result which must take a certain form (e.g. be a set of triples) and occasionally mandatory if the result set must be forced into triple form. INTEROPERABILITY ISSUES: 1) There is, to the best of my knowledge, no way for a SPARQL user to command the creation of an RDF container within a data store. The lack of an explicit command will encourage implementors to create their own, thereby hurting interoperability. Similarly for deletion requests. 2) The form and content of DESCRIBE results are left to the data publisher. It would seem that such an open-ended conversation would require a human consumer in the general case. I am left wondering why the DESCRIBE functionality is not left to a more general SELECT query against a describing RDF container. SCALABILITY CONCERNS: 1) The evaluation of regular expressions after the binding of graph patterns rules out a lot of potential join optimizations. 2) OPTIONAL, in its entirety. The concern is that querying very large data sets using OPTIONAL would result in very large intermediate results requiring joining. Subqueries effectively sidestep that problem by allowing further restrictions against a smaller result set. 3) UNSAID would have been of great concern with regards to scalability, just in case it comes back :) SPECIFICATION ISSUES: 1) It would be really nice if the grammar was ordered for easier reading, perhaps alphabetically.
Received on Friday, 14 October 2005 18:31:22 UTC