- From: David Wood <dwood@softwarememetics.com>
- Date: Fri, 14 Oct 2005 14:31:13 -0400
- To: public-swbp-wg@w3.org
Hi all,
I have an action item [1] to review and comment on the specification
for the SPARQL Query Language for RDF [2].
I reviewed the 21 July 2005 Working Draft.
Regards,
Dave
[1] http://www.w3.org/2005/10/03-swbp-minutes.html#action17
[2] http://www.w3.org/TR/rdf-sparql-query/
-----------------------------%
<--------------------------------------------
Review of SPARQL Query Language for RDF Working Draft
OVERVIEW:
LANGUAGE ISSUES:
1) SPARQL would appear not to have a complete model theoretic base.
Although sections of the specification are described using sets,
nothing is presented which hangs all of the section together. This
is unfortunate and is, I think, the underlying cause of some of
language features critiqued below. (NB: I tried to get Simon
Raboczi to complete and submit his model theory unifying iTQL and
SPARQL, but was unsuccessful in doing so.)
2) The language is not built with extensibility in mind. That is,
is not, in my opinion, sufficiently functional. There are several
areas of functionality which we already know are of interest to users
of RDF query languages (e.g. iTQL's 'walk' and 'trans' functions
which perform generic graph walking and transitive closure,
respectfully) and it is difficult to see how one might add these
commands to a later SPARQL version without making wholesale changes
to the language.
3) The handling of blank nodes ("bnodes") is, again in my opinion,
the single greatest failure of the specification. We have to admit
that RDF graphs contain bnodes and queries will run across them. We
also have to admit that a querier will (not 'may') often want to
subsequently find information connected to those bnodes. SPARQL's
insistence that bnodes' true internal identities not be returned to a
querier (correct in and of itself) combined with the lack of subquery
capability ensures that many useful RDF queries routinely performed
in other languages simply cannot be written in SPARQL. OPTIONAL
addresses only part of that functionality.
4) SPARQL contains a large number of top-level commands. This could
be a result again of the lack of subqueries and an underlying model
theory. It is an unfortunate design choice.
5) Good language design would dictate that logical opposites (e.g.
conjunction and disjunction) be represented in syntactically similar
ways, even if one is generally implicit. Therefore, since UNION
(disjunction) is present (as well it should be) along with
conjunction, then conjunction should have an optional equivalent
keyword even though it is implicit in most uses.
6) I was initially concerned about the definitions of subjects as
including literals in triple patterns, until Andy Seaborne pointed
out the differences between a triple and a triple matching pattern.
7) The nulls generated by UNION and the nulls generated by OPTIONAL
may be distinct. They correspond to logical true and false,
respectively. That makes life a bit difficult for implementors. It
may be that another form of 'null' should be considered. Thanks to
Simon Raboczi for this analysis.
8) There does not seem to be any way to force a literal into the
variable position in a binding. That is very useful when attempting
to create a result which must take a certain form (e.g. be a set of
triples) and occasionally mandatory if the result set must be forced
into triple form.
INTEROPERABILITY ISSUES:
1) There is, to the best of my knowledge, no way for a SPARQL user
to command the creation of an RDF container within a data store. The
lack of an explicit command will encourage implementors to create
their own, thereby hurting interoperability. Similarly for deletion
requests.
2) The form and content of DESCRIBE results are left to the data
publisher. It would seem that such an open-ended conversation would
require a human consumer in the general case. I am left wondering
why the DESCRIBE functionality is not left to a more general SELECT
query against a describing RDF container.
SCALABILITY CONCERNS:
1) The evaluation of regular expressions after the binding of graph
patterns rules out a lot of potential join optimizations.
2) OPTIONAL, in its entirety. The concern is that querying very
large data sets using OPTIONAL would result in very large
intermediate results requiring joining. Subqueries effectively
sidestep that problem by allowing further restrictions against a
smaller result set.
3) UNSAID would have been of great concern with regards to
scalability, just in case it comes back :)
SPECIFICATION ISSUES:
1) It would be really nice if the grammar was ordered for easier
reading, perhaps alphabetically.
Received on Friday, 14 October 2005 18:31:22 UTC