Comments on SPARQL 17-Feb-2005 draft from Graham Klyne on 2005-04-08 (public-rdf-dawg-comments@w3.org from April 2005)

From: Graham Klyne <gk@ninebynine.org>
Date: Fri, 08 Apr 2005 09:52:13 +0100
To: public-rdf-dawg-comments@w3.org
Message-Id: <5.1.0.14.2.20050408085640.00bb67b8@127.0.0.1>
w.r.t:
   http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050217/

Now I think I've just about internalized the main features of this draft, I 
have some more feedback.  Essentially, I fear that it's overcomplicated, 
and could use some streamlining.  Specific features that I feel could be 
removed or simplified are:
   Multiple-graph queries
   DESCRIBE result type
   Required SPARQL operators
(more details below)

I don't object to these being defined possibilities, but I feel they 
represent an onerous implementation burden relative to their utility, and 
thus should not be part of the mandatory SPARQL core in their present form, 
to the extent that the same results can be achieved (modulo efficiency 
concerns) without the full capabilities currently described.  If they are 
retained as optional features of SPARQL, I would prefer to seem them 
specified in separate documents, or in appendices to the core specification.


1. Multiple-graph queries

I think these should be removed from the basic SPARQL core, since I feel 
they add a fair deal of implementation complexity and an application can 
achieve the same result by submitting multiple queries, possibly to 
different query processors.

I also feel it would be premature to standardize an approach to multi-graph 
querying ahead of there being a consensus/standard for something like RDF 
named graphs.

I recognize that there are certain efficiencies which can be achieved only 
by allowing a client to submit such a query to an intelligent query 
processor, and think it should be possible to extend the basic SPARQL 
language cleanly to accommodate such queries.


2. DESCRIBE result type

I think the desired effect could be achieved using standard SPARQL queries 
with some special RDF vocabulary, or maybe in conjunction with something 
like Larry Masinter's tdb: URI scheme proposal 
(http://larry.masinter.net/duri.html).  Therefore, I think it has NO PLACE 
in the base SPARQL specification, since it adds to implementer burden 
without creating any otherwise-unavailable functionality.

I will almost certainly not include this in my implementation.


3. Required SPARQL operators

This is less clear-cut to me, but I think there are some of the specified 
operators that don't need to be mandatory in every implementation, there 
being an extension mechanism to fall back to.

I think a minimal set of operators might be string and numeric comparisons, 
plus the SPARQL tests BOUND, ISURI, ISBLANK, ISLITERAL.

The XQuery connectives appear to duplicate the union and intersection 
(group) graph patterns.

Although date/time tests are clearly useful for a significant class of 
applications, I don't think they are needed for all applications, and could 
be optional.

Similarly, I can see regex matching is useful but not always essential, and 
it does in some cases impose a significant implementation burden if the 
right form of regex library happens not to be available -- I think this 
feature should be optional.

...

While considering value-testing options:

Concerning extensibility of value testing operations, I think there should 
be an option to treat unrecognized tests as "True" rather than failures, 
this resulting in additional query results being returned that can be 
filtered by the client (thus allowing a client application to work with 
different query processors with differing support for value testing functions).

I also think that it would be useful if there is a mechanism to discover 
what value testing operations are supported by a query processor, maybe as 
part of the protocol, or simply using some special RDF vocabulary to make 
queries about the processor itself (I favour the latter).

...

In our research group, we have a strong requirement to perform stemmed 
string matching in RDF queries, which is handled natively by the underlying 
database.  (This project is using Jena on Postgres, and to achieve this 
functionality we currently have to perform an evil hack which involves 
manipulating Jena's underlying database schema, so we can use the Postgres' 
test-stemming search directly.  Ugh!!)

Extensible value-tests in SPARQL seems to be the right way to address this 
kind of problem, so that a query processor supporting this feature can make 
maximal use of the underlying capabilties.  I mention this here simply so 
that the requirement can be noted as a desideratum for SPARQL.

...

I also have some further comments about details of the specification, some 
substantive, but they'll have to wait for another message.

#g



------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Friday, 8 April 2005 08:56:00 UTC