Objections to current SPARQL specification from Andrew Newman on 2007-11-01 (public-rdf-dawg-comments@w3.org from November 2007)

From: Andrew Newman <andrewfnewman@gmail.com>
Date: Fri, 2 Nov 2007 07:11:02 +1000
To: public-rdf-dawg-comments@w3.org
Message-ID: <2db5a5c40711011411r9b11471g8a19e71534115aed@mail.gmail.com>

Here is a summary of the objections that I've had with the SPARQL
specification over the years.  Some of my objections are no longer
relevant and some are more important given the direction the
specification went.  I also have previously struggled with OPTIONAL
but am unable to provide a better solution even though I tried to
develop one using full outer join rather than left outer join.

I've attempted to get my previously views added to the formal list of
objections but haven't had much success.

The basis of my objection is founded on SPARQL being an RDF query
language and that it should use an RDF data model throughout.  This is
one property that represents what is considered good design for query
languages (for RDF query languages see [1] but it has been covered
elsewhere in criticism of other query languages such as SQL).

One feature that SPARQL lacks is closure.  Having closure on all
operations means that intermediate results and answers are always tied
to an RDF graph.  It means that in each step of the query evaluation
you are dealing with valid subsets of RDF graphs.  The current
specification, however, reverts to an SQL/multiset/bindings to
variables that is not compatible with the RDF model.

To summarize, my objections include [2][3][4][5]:
* Lack of closure.
* Inconsistencies between SPARQL triples and the currently defined RDF
standard (requiring special handling of say CONSTRUCT when there are
literals as subjects).  If SPARQL was defined in terms of RDF, if RDF
changed then SPARQL would naturally change.  The current way the
specification was created seems to allow a difference between the
language and the data it's querying.
* The use of multiset semantics instead of semantics consistent with
RDF (set based semantics).
* The multiple uses of unbound - you cannot distinguish between a
result from an OPTIONAL operation or from a variable that is not used
in the query.  This prevents it from being understood, without
retaining the original query, what unbound means.
* Existence of DISTINCT and REDUCED (set based semantics don't have duplicates).
* Existence of CONSTRUCT (should just be a projection of all columns).
* Existence of ASK (should just be a projection of no columns giving
DEE or DUM, T/F).
* Lack of nadic operators (JOIN, UNION and possibly OPTIONAL).
* Lack of SUMMARIZE (set based aggregate function).

[1] J. Bailey et al, "Web and Semantic Web Query Languages: A Survey,"
LNCS 3564, 2005, Norbert Eisinger, Jan Maluszynski (editor(s)),
[2] http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Nov/0006.html
[3] http://jrdf.sourceforge.net/thesis/2006/RelationalBasedSPARQL.html
[4] http://www.xml.com/lpt/a/1695
[5] http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2004Nov/0001.html

Received on Thursday, 1 November 2007 21:11:12 UTC