- From: Arjohn Kampman <arjohn.kampman@aduna.biz>
- Date: Fri, 18 Mar 2005 17:13:34 +0100
- To: public-rdf-dawg-comments@w3.org
- Cc: Jeen Broekstra <jeen@aduna.biz>
Dear SPARQL-editors and -enthusiasts,
The following are comments on the editors working draft, revision 1.256
(2005/03/17), for "SPARQL Query Language for RDF". The feedback is
inspired by our experience with the development and use of a number of
query languages in Sesame[1], most notably SeRQL[2]. Apologies for
coming up with such a long list of comments this late in the process,
but we honestly haven't been able to find the time for a thorough review
of the document until now. A number of editorial comments can be found
at the end of this e-mail.
Arjohn Kampman
Jeen Broekstra
[1] http://www.openrdf.org/
[2] http://www.openrdf.org/doc/SeRQLmanual.html
General comments (in no specific order)
---------------------------------------
- We are not very fond of SELECT-WHERE-FILTER construction. Considering
that the FROM keyword is no longer used for specifying datasets; how
about adopting the SQL-style SELECT-FROM-WHERE construction instead?
It could prevent confusion with people coming from a database world
that expect the WHERE-clause to contain boolean constraints.
- The document suggests that (parts of) queries can only be evaluated on
a specific graph: either the background graph or a named graph. We
would have expected that, when no specific graph label is specified,
the query would be evaluated on the union of all graphs. The grammar
mentions a "GRAPH * ..." construction, which might be related to this
but which is not explained in the document.
- Named graphs are identified by URIs; bnodes or literals cannot be used
for this purpose. This forces application developers to generate URIs
when a simple string would be sufficient. Supporting literals as graph
names would allow developers to use simple string or datatyped dates
to tag specific sets of statements. Would this be useful?
- The definition of DESCRIBE is very loose: maybe too loose to be useful
in practice? An application developer would likely have a guarantee as
to whether the mechanism yields the info that is needed. As it is now,
the mechanism could very well result in the development of several
"DESCRIBE-dialects", which offer this guarantee for specific use
cases. We think a fixed definition like "it returns the bnode closure
for the concerning URIs" would be more useful.
- SeRQL offers default bindings for the often used prefixes 'rdf',
'rdfs' and 'xsd'. If not specified in the query itself, these prefixes
map to the standard RDF, RDF Schema and XML Schema namespaces. This
has proved to be very convenient. Is this a feature that should be
added to SPARQL too? We noted that the comment for version 1.244 of
the document mentions: "Removed text for default prefixes for rdf:
rdfs: owl: xsd:", but we we're unable to find a reason for this in the
mailing list archives.
- The current specification allows only variables to be specified in the
SELECT-clause. However, on some occasions it can be very convenient
to be able to specify constants or functions in the projection. For
example:
* When an application fires two queries, one of them specifying a
default value (a constant) for tuples where that specific column
does not get a value from the graph. This becomes even more useful
when the UNION operator operates on queries instead of on graph
patterns (see later comments also).
* When an application is interested in the sum, product, etc. of two
or more fields, e.g. when converting from one currency to another.
- Concerning the remark in section 3.2:
"Open: whether to allow "foo"@?v or ?v@fr or ?v^^xsd:integer or
"foo"^^?v".
When functions like STR(A) and LANG(A) would also be allowed in the
projection (see previous comment), this would give a good alternative
to the above constructions.
- The current specification describes a UNION operator that can be
applied to graph patterns, instead of to queries like is done in SQL.
This affects the expressivity of the query language when constants
and/or functions would be allowed in the projection. The following
example query, an alternative to the queries described in section 6.1,
illustrates this by using a constant in the projection:
PREFIX ...
SELECT ?title "1.0"
WHERE { ?book dc10:title ?title }
UNION
SELECT ?title "1.1"
WHERE { ?book dc11:title ?title }
The expected result of this query being:
title | version
----------------------------------|--------
"SPARQL Protocol Tutorial" | "1.1"
"SPARQL Query Language Tutorial" | "1.0"
- There is a strong demand from the Sesame community to add ORDER BY and
GROUP BY/COUNT functionality to SeRQL. It's good to see that the
former has already been added to the editor's draft. However, we feel
that the latter is just as important. Having to transmit complete
query results only to be able to count specific rows adds a lot of
unnecessary network traffic and can really hurt performance.
- Section 2.1 mentions:
"Prefixes apply to the query after they are defined; redefining a
prefix causes the new defintion to be used from that point in the
syntax."
The fact that prefixes apply to the query after they are defined is
trivial as prefixes must be defined at the start of a query (according
to the grammar). Allowing prefixes to be redefined doesn't seem to
make much sense in the context of SPARQL (this in contrary to Turtle).
Rather, it is more than likely that duplicate prefix declarations are
caused by slopiness on the account of the query writer (e.g.
copy-paste errors). This type of error is often very hard to detect,
therefore it would be wise disallow redefinition of prefixes and flag
the occurence of these as errors.
- We have strong doubts about allowing blank nodes to be used as a kind
of anonymous variables. People that are new to the query language will
probably assume that specific bnodes can be specified in queries,
causing confusion when they find out that it doesn't work like that.
Also, the extra notation for variables doesn't appear to add any
expressive power to SPARQL and seems to be a purely syntactic thing.
Editorial comments
------------------
Section 2.1:
* typo: "...causes the new defintion to be..."
* The query in "Data descriptions used in this document" is said to be
equivalent to the previous query, which is not true: this query
has a variable as subject, whereas the previous query has a URI.
Section 2.4:
* typo: "...where each of the tripe patterns matches..."
Section 3.1:
* All but the first query use ?v in the SELECT-clause without binding it
in the WHERE-clause.
Section 3.2:
* The query is said to be using a blank node as a variable, which is not
true.
* typo: "A patten may be...". Also, the concerning sentence appears to
be formulated incorrectly.
* "Note that a constraint can be considered to be a triple with a
special predicate." -- Superfluous remark? Why is this mentioned when
constraints cannot be written down as such?
Section 4:
* The definition of Graph Pattern includes Graph Pattern itself. Is this
correct?
* typo: "A Basic Graph Patterns..."
* typo: "...is, as described above, is..."
* The second query uses the ';' character at the end of a triple pattern
but continues with another full triple pattern.
Section 5:
* typo: "...to be added to solution where..."
Section 5.5:
* The query is missing the ?mbox variable in the SELECT-clause.
Section 6:
* typo: "...provides a means combining..."
* The queries in the subsections map the 'dc10' prefix to the DC 1.1
namespace and the 'dc11' prefix to the DC 1.0 namespace. This is not
logical and even makes the second query incorrect (when compared to
the described result).
Section 7:
* typo: "...hold a multiple RDF graphs..."
* typo: "G is a called the..."
* typo: "...does not need to described..."
Section 8.1:
* The 'data' prefix is defined but not used in the query.
Section 8.3:
* typo: "...whether in about GRAPH clause..."
* typo: "...in one part of a querym..."
* typo: "...as foudn in..."
* typo: "...to a particualr..."
Section 8.4:
* typo: "...a aggregator has found read in a..."
* The 'data' prefix is defined but not used in the query.
Section 10.2:
* This section covers serialization issues, specifically elaborating on
the fact that results can be serialized into XML or an RDF graph. We
feel that this part is a bit off-topic and that it would be better to
replace it with a simple reference to the SPARQL protocol WD. After
all, the work on the protocol isn't finished yet and it _might_ come
up with another solution.
* "If both DISTINCT and LIMIT are specified, then duplicates are
eliminated before the limit is applied." -- OFFSET should also
be mentioned in this context.
Section 10.3:
* The first paragraph still mentions the "CONSTRUCT * ..." option.
Section 11.1.1:
* typo: "...considers the the following..."
* typo: "...any r:Literal may be is cast to..."
Section 11.2.0.1:
* typo: "...takes a boolean arguement..."
* Table 11.1 documents the result type of the LANG(A) operator to be
rdf:uri. This should probably be xs:string?
Section A:
* We have a number of remarks concerning the grammar, which is ambiguous
or at least needs unnecessary large look-aheads in a number of rules.
However, we're not sure if the grammar is considered to be final
enough for this kind of comments. Please let us know if you're
interested.
Section B:
* Given the large number of similarities between SPARQL and SeRQL, it's
hard to imagine that SeRQL was not used as a reference language. If it
was used, we would really appreciate if a reference to SeRQL was added
to this section.
Received on Friday, 18 March 2005 16:13:38 UTC