- From: Arjohn Kampman <arjohn.kampman@aduna.biz>
- Date: Fri, 18 Mar 2005 17:13:34 +0100
- To: public-rdf-dawg-comments@w3.org
- Cc: Jeen Broekstra <jeen@aduna.biz>
Dear SPARQL-editors and -enthusiasts, The following are comments on the editors working draft, revision 1.256 (2005/03/17), for "SPARQL Query Language for RDF". The feedback is inspired by our experience with the development and use of a number of query languages in Sesame[1], most notably SeRQL[2]. Apologies for coming up with such a long list of comments this late in the process, but we honestly haven't been able to find the time for a thorough review of the document until now. A number of editorial comments can be found at the end of this e-mail. Arjohn Kampman Jeen Broekstra [1] http://www.openrdf.org/ [2] http://www.openrdf.org/doc/SeRQLmanual.html General comments (in no specific order) --------------------------------------- - We are not very fond of SELECT-WHERE-FILTER construction. Considering that the FROM keyword is no longer used for specifying datasets; how about adopting the SQL-style SELECT-FROM-WHERE construction instead? It could prevent confusion with people coming from a database world that expect the WHERE-clause to contain boolean constraints. - The document suggests that (parts of) queries can only be evaluated on a specific graph: either the background graph or a named graph. We would have expected that, when no specific graph label is specified, the query would be evaluated on the union of all graphs. The grammar mentions a "GRAPH * ..." construction, which might be related to this but which is not explained in the document. - Named graphs are identified by URIs; bnodes or literals cannot be used for this purpose. This forces application developers to generate URIs when a simple string would be sufficient. Supporting literals as graph names would allow developers to use simple string or datatyped dates to tag specific sets of statements. Would this be useful? - The definition of DESCRIBE is very loose: maybe too loose to be useful in practice? An application developer would likely have a guarantee as to whether the mechanism yields the info that is needed. As it is now, the mechanism could very well result in the development of several "DESCRIBE-dialects", which offer this guarantee for specific use cases. We think a fixed definition like "it returns the bnode closure for the concerning URIs" would be more useful. - SeRQL offers default bindings for the often used prefixes 'rdf', 'rdfs' and 'xsd'. If not specified in the query itself, these prefixes map to the standard RDF, RDF Schema and XML Schema namespaces. This has proved to be very convenient. Is this a feature that should be added to SPARQL too? We noted that the comment for version 1.244 of the document mentions: "Removed text for default prefixes for rdf: rdfs: owl: xsd:", but we we're unable to find a reason for this in the mailing list archives. - The current specification allows only variables to be specified in the SELECT-clause. However, on some occasions it can be very convenient to be able to specify constants or functions in the projection. For example: * When an application fires two queries, one of them specifying a default value (a constant) for tuples where that specific column does not get a value from the graph. This becomes even more useful when the UNION operator operates on queries instead of on graph patterns (see later comments also). * When an application is interested in the sum, product, etc. of two or more fields, e.g. when converting from one currency to another. - Concerning the remark in section 3.2: "Open: whether to allow "foo"@?v or ?v@fr or ?v^^xsd:integer or "foo"^^?v". When functions like STR(A) and LANG(A) would also be allowed in the projection (see previous comment), this would give a good alternative to the above constructions. - The current specification describes a UNION operator that can be applied to graph patterns, instead of to queries like is done in SQL. This affects the expressivity of the query language when constants and/or functions would be allowed in the projection. The following example query, an alternative to the queries described in section 6.1, illustrates this by using a constant in the projection: PREFIX ... SELECT ?title "1.0" WHERE { ?book dc10:title ?title } UNION SELECT ?title "1.1" WHERE { ?book dc11:title ?title } The expected result of this query being: title | version ----------------------------------|-------- "SPARQL Protocol Tutorial" | "1.1" "SPARQL Query Language Tutorial" | "1.0" - There is a strong demand from the Sesame community to add ORDER BY and GROUP BY/COUNT functionality to SeRQL. It's good to see that the former has already been added to the editor's draft. However, we feel that the latter is just as important. Having to transmit complete query results only to be able to count specific rows adds a lot of unnecessary network traffic and can really hurt performance. - Section 2.1 mentions: "Prefixes apply to the query after they are defined; redefining a prefix causes the new defintion to be used from that point in the syntax." The fact that prefixes apply to the query after they are defined is trivial as prefixes must be defined at the start of a query (according to the grammar). Allowing prefixes to be redefined doesn't seem to make much sense in the context of SPARQL (this in contrary to Turtle). Rather, it is more than likely that duplicate prefix declarations are caused by slopiness on the account of the query writer (e.g. copy-paste errors). This type of error is often very hard to detect, therefore it would be wise disallow redefinition of prefixes and flag the occurence of these as errors. - We have strong doubts about allowing blank nodes to be used as a kind of anonymous variables. People that are new to the query language will probably assume that specific bnodes can be specified in queries, causing confusion when they find out that it doesn't work like that. Also, the extra notation for variables doesn't appear to add any expressive power to SPARQL and seems to be a purely syntactic thing. Editorial comments ------------------ Section 2.1: * typo: "...causes the new defintion to be..." * The query in "Data descriptions used in this document" is said to be equivalent to the previous query, which is not true: this query has a variable as subject, whereas the previous query has a URI. Section 2.4: * typo: "...where each of the tripe patterns matches..." Section 3.1: * All but the first query use ?v in the SELECT-clause without binding it in the WHERE-clause. Section 3.2: * The query is said to be using a blank node as a variable, which is not true. * typo: "A patten may be...". Also, the concerning sentence appears to be formulated incorrectly. * "Note that a constraint can be considered to be a triple with a special predicate." -- Superfluous remark? Why is this mentioned when constraints cannot be written down as such? Section 4: * The definition of Graph Pattern includes Graph Pattern itself. Is this correct? * typo: "A Basic Graph Patterns..." * typo: "...is, as described above, is..." * The second query uses the ';' character at the end of a triple pattern but continues with another full triple pattern. Section 5: * typo: "...to be added to solution where..." Section 5.5: * The query is missing the ?mbox variable in the SELECT-clause. Section 6: * typo: "...provides a means combining..." * The queries in the subsections map the 'dc10' prefix to the DC 1.1 namespace and the 'dc11' prefix to the DC 1.0 namespace. This is not logical and even makes the second query incorrect (when compared to the described result). Section 7: * typo: "...hold a multiple RDF graphs..." * typo: "G is a called the..." * typo: "...does not need to described..." Section 8.1: * The 'data' prefix is defined but not used in the query. Section 8.3: * typo: "...whether in about GRAPH clause..." * typo: "...in one part of a querym..." * typo: "...as foudn in..." * typo: "...to a particualr..." Section 8.4: * typo: "...a aggregator has found read in a..." * The 'data' prefix is defined but not used in the query. Section 10.2: * This section covers serialization issues, specifically elaborating on the fact that results can be serialized into XML or an RDF graph. We feel that this part is a bit off-topic and that it would be better to replace it with a simple reference to the SPARQL protocol WD. After all, the work on the protocol isn't finished yet and it _might_ come up with another solution. * "If both DISTINCT and LIMIT are specified, then duplicates are eliminated before the limit is applied." -- OFFSET should also be mentioned in this context. Section 10.3: * The first paragraph still mentions the "CONSTRUCT * ..." option. Section 11.1.1: * typo: "...considers the the following..." * typo: "...any r:Literal may be is cast to..." Section 11.2.0.1: * typo: "...takes a boolean arguement..." * Table 11.1 documents the result type of the LANG(A) operator to be rdf:uri. This should probably be xs:string? Section A: * We have a number of remarks concerning the grammar, which is ambiguous or at least needs unnecessary large look-aheads in a number of rules. However, we're not sure if the grammar is considered to be final enough for this kind of comments. Please let us know if you're interested. Section B: * Given the large number of similarities between SPARQL and SeRQL, it's hard to imagine that SeRQL was not used as a reference language. If it was used, we would really appreciate if a reference to SeRQL was added to this section.
Received on Friday, 18 March 2005 16:13:38 UTC