Dave Beckett comments on SPARQL 1.1 Query Language W3C WD 2010-10-14

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

These are my personal comments (not speaking for any past or current
employer) on:

SPARQL 1.1 Query Language
W3C Working Draft 14 October 2010
http://www.w3.org/TR/2010/WD-sparql11-query-20101014/


My comments are based on the work I did to add some SPARQL 1.1 query
and update support to my Rasqal rdf query library (engine and API)
  http://librdf.org/rasqal/
in version 0.9.21 just released 2010-12-04 as announced at:
  http://lists.w3.org/Archives/Public/semantic-web/2010Dec/0055.html

Some background to my work is given in a blog post at
http://journal.dajobe.org/journal/posts/2010/10/24/writing-an-rdf-query-engine-twice/


I. General comments

I felt the specification introduced more optional features bundled
together, where it was not entirely clear what the combination of
those features would do.  For example a query with no aggregate
expression but has a GROUP BY and HAVING is allowed by the syntax
and the main document doesn't say if it's allowed or what it means.

I found it hard to assemble all the pieces from the mathematical
explanations into something I could code.

The spec has several terms in the grammar not in the query document.
After asking, these turned out to be federated query (BINDINGS), or
update (LOAD, ...) but these are not pointed out or linked to clearly
although there is mention of the documents in the status section.  Please
make these more clear.

I decided to concentrate on the new Aggregates feature since I had
already implemented SELECT expressions, leaving Subqueries and
Negation to later.  Property paths should be in the list of new
features in the status section at the of the document.

"SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs"
is rather a long title; what does 'Uniform' or 'HTTP' add?  SOAP is dead.
suggest "SPARQL 1.1 RDF Graph Management Protocol"
or RDF dataset

With all the additions especially property paths (a new query
language), update (data management language) and federated query
(remote query support) and I understand ~30 additional keywords are
being added beyond this draft for functions and operators, I see this
as a major change to SPARQL 1.0, more of a SPARQL 2.  You should
consider renaming it.


II. Aggregates

Found the math in the aggregation and grouping sections rather
hard to understand so I also looked what MySQL and SQLite did, and
wrote my own diagram based on the data flow
  http://www.dajobe.org/2009/11/sparql11/
so for me it was easier to see the individual components/stages
(which roughly correspond to SPARQL algebra terms).

I had to make several of my own tests with my guess on what the
answers should be.  With all the pieces for aggregate expressions:
grouping, aggregate expression, distinct, having, counting (count *
vs count(expr)) there needs to be several tests with good coverage.

I felt aggregate functions can be broken down into these parts
1. selecting of aggregate function value
2. grouping of results - optional; explicit, implicit when agg func present
3. execution of aggregate functions - optional; with some special cases
4. filtering of group results with having - optional
(following my diagram above)

As it is clear they are all optional, it probably is worth explaining
what it means when they are absent, such as group by + having with
no aggregate expression as mentioned above.


III. Bindings is a new syntax

BINDINGS essentially gives a new way to write down a variable
bindings result set.  Even though it is discussed in the federated
query spec about using it for SERVICE, it's not restricted to that by
the grammar or specifications.

BINDINGS in the query grammar:
  http://www.w3.org/TR/2010/WD-sparql11-query-20101014/#rBindingsClause
I previously asked about on 2010-10-15 at:
  http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2010Oct/0044.html
and this comment is an extension of that comment.

So as I read it this is a valid 'query' which does no real execution
but just returns a set.

SELECT * WHERE BINDINGS ?var1 ?var2 {
  ( "var1-value1"  "var2-value1" )
  ( "var1-value2"  "var2-value2" )
}

or if you really must you can leave out the WHERE:
SELECT * BINDINGS {
  ( "var1-value1"  "var2-value1" )
  ( "var1-value2"  "var2-value2" )
}

My question is to ask if this is correct and to clarify in the spec
the intended use, whether or not it is intended for use with SERVICE
only.


IV. Section-by-section comments

Section: Status of this Document
Should mention property paths as new since that is a major addition
after SPARQL 1.0

Please link to the documents in the status, these are just text.


Sections 1-8
Skipped, they are same as SPARQL 1.0 I hope


9 Property Paths
I am unlikely to ever implement any of this, it's a second query
language inside SPARQL.  How many systems implemented this before
the SPARQL 1.1 work was started?


10 Aggregates

I took all the examples in this section and turned them into test
cases where possible.

10.2
The explanation of errors and ListEvalE is rather opaque.  It is
still not clear to me what is done with errors in GROUP BY, HAVING
and arguments to aggregate expressions.  Some are skipped, some
are ignored and return NULL.  Examples and tests will enable checking
this but the spec needs to be clearer.

Definition: Group and Aggregation were hard for me to understand.
The input to Aggregation being a 'scalar' meaning actually a set of
key:value pairs was confusing.  It is not also not clear if those
are a set or an ordered set of parameters.  This is only used today
for the 'separator' with GROUP_CONCAT.


10.2.1 HAVING
What happens when there is an expression error?

What variables and expressions can be used here and what is their scope?


10.2.2 Set Functions

Another confusing section.  I mostly ignored this and did what SQL did.

None of the functions that I can tell, ever use 'err'.


10.2.3 Mapping from Abstract Syntax to Algebra

scalarvals argument is used here - I think this is called 'scalar' earlier.

Un-numbered Section after 10.2.3: Joining Aggregate Values

Never figured out what this was trying to define but my code executes
the example.


11. Subqueries
(Ignored in my current work)


12 RDF Dataset

(Same as SPARQL 1.0 I assume so no comments)


13 Basic Federated Query

Yes, please merge in the text here.


14 Solution Sequences and Modifiers

( Aside: This is one of those SPARQL parts where everything mentioned is
optional. Otherwise this section has no change from SPARQL 1.0, I am just
mentioning it as a pointer of a trend. )


15. Query Forms
No comments.


16. Testing Values

16.3 Operator Mapping

Is it worth noting the new operators in SPARQL 1.1?

Operators: implemented isNUMERIC()

16.4 Operators Definitions

My current state of implementation of new to SPARQL 1.1 expressions
16.4.16 IF - implemented
16.4.17 IN - implemented
16.4.18 NOT IN - implemented
16.4.19 IRI - implemented
16.4.20 URI - implemented
16.4.21 BNODE - implemented
16.4.22 STRDT - implemented
16.4.23 STRLANG - implemented

No comments on the above

16.4.24 NOT EXISTS and EXISTS
I am lumping these together with sub-SELECT to implement.
My concern here is that the syntax gets super-complex since all the graph
pattern syntax can now appear inside any expression syntax.

[[There is a filter operator "exists" that ...]]
Does this imply these can only appear in FILTER expressions?  Please clarify.


17 Definition of SPARQL

I looked at the 17.2.3 for aggregate queries and it was more helpful
than the math earlier.  The pseudo code in Step 4 is a bit too
unclear.  Is that an example implementation or the required one?

17.6 Extending SPARQL Basic Graph Matching

Ignored.


18 SPARQL Grammar

Clearly this is not complete; there are lots of notes to update it.


19 Conformance

If property paths are not removed, please add a conformance level
that includes SPARQL 1.1 without property paths.

Does SPARQL 1.1 Query require implementation of the dependent specs -
federated query and update?  Looks to me that protocol may also be
dependent?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)

iD8DBQFM/CsdQ+ySUE9xlVoRAuwIAJ9M2eMHwpyQWR+/9PdCGGKXx3elmQCfWP/g
g5FTvQ9knkA04+7PkzvtIQI=
=HqTf
-----END PGP SIGNATURE-----

Received on Monday, 6 December 2010 00:16:00 UTC