SPARQL Query review from Polleres, Axel on 2011-03-17 (public-rdf-dawg@w3.org from January to March 2011)

From: Polleres, Axel <axel.polleres@deri.org>
Date: Thu, 17 Mar 2011 01:29:24 +0000
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <5CC16855-936A-4D8A-A1AB-BE2868D62AB8@deri.org>
Sorry for my silence over the last two weeks, was traveling and subsequently stuck in backlog. 

At least, here's (part 1 of) my review for SPARQL query
 http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml
most of which based on a version from March 8th (which I read offline, so some remarks may 
already be outdated, I also didn't manage to check against the other query reviews yet, apologies).


I am still missing (for part 2 of this review):
 - Section 17
 - in Section 18, i have only partially reviewed 18.4 and 18.5 so far.
Anyways, in case I don't manage to send more comments, please consider this review done for LC.

Axel

----------------------------------------------------------------

* Abstract

 "The results of SPARQL queries can be results sets or RDF graphs."

  s/results sets/result sets/

* Status of this Document

 "The new features are:"

 I'd add here to make it clear

 "The new features in SPARQL 1.1 Query are:"

* Section 1.1

 "Section 17 defines SPARQL's extensible value testing framework"

suggestion:

 "Section 17 defines SPARQL's extensible function library for creating values and value testing"


* Section 2.3.1

 not sure why, but the table for the second solution doesn't seem to have a right border in my browser.

* Section 2.5

 shall "expressions in the SELECT clause" link to Section 16.1.2 ?

* Section 4

 "The full grammar  is given in appendix A."

should be

 "The full grammar  is given in section 19."


* Section 9.1

 In the table, may I suggest not to use italic font for '|' in order to avoid possible confusion with "/"

* Section 9.3


 "whereas if the query were written out to include the intermediate variable (?a), 
	no rows in the results are duplicates:" 

 is not a sentence, suggest just to write:

 "If the query were written out to include the intermediate variable (?a), no rows in the results are duplicates:" 

* Section 11.1

- 
 "In aggregate queries and sub-queries variables that appear in the query
pattern, but are not in the GROUP BY clause cannot be projected nor used in project
expressions." 

 I'd suggest to add a forward pointer to Section 11.4 here, which explains this aspect again in more detail, i.e.:

 "In aggregate queries and sub-queries variables that appear in the query
pattern, but are not in the GROUP BY clause cannot be projected nor used in project
expressions; in order to project arbitrary expressions the SAMPLE
aggregate may be used. For details, cf. <a href=...>Section 11.4</a> below."

- 
 "It should be noted that as per functions, aggregate expressions MUST be aliased in order to project them from queries or subqueries."

at this stage ofd the doc it's not clear what "aliased" means, so I suggest to say it explicitly:

 "It should be noted that as per functions, aggregate expressions MUST be aliased (again, similar to the BIND clause, using the keyword 'AS') in order to project them from queries or subqueries."

* Section 11.2

 - In the example:  
 
  in Section 18.5 Definition: Evaluation of Aggregation, Aggregation is defined as  
     Aggregation(exprlist, func, scalarvals, P) where I assume P   is a pattern. 
  However, in the example here, you write:  Aggregation((?y), Avg, {}, G) where G is a Group.
 
  Also Avg() is a function over a multiset, so I don't really understand what (2) -> Avg({(3), (5)}, 0, {}), means (I suppose that's a leftover from when we wanted to count errors?)


 - Next:

 "Note that the empty set is used in some aggregates to pass scalar arguments to the set function."

 I am afraid, at this point, the reader doesn't know nor understand what *scalar arguments* are, so I am afraid this sentence is not understandable here, unless we add an example which uses scalar arguments and explains them. I'd suggest to add that.

* Section 11.3

 An example for HAVING should be added. 


* Section 11.4

 An example using SAMPLE would help here in the end. Ideally, all the examples in this section should also have data and results.

* Section 11.5 

- It seems we use "scalar value" and "scalar argument" interchangeably, but not uniformly, the temriniology should be consistent (or if that are two different things it should be explained before it is used the first time)

- "Definitions may be found in the SPARQL Algebra section."
 suggest to number references to different sections, i.e. 
  "Definitions may be found in  section 18.4 SPARQL Algebra, below."

- "Note that, unless the ; separator is used this requires the parser to know whether some IRI refers to a function, cast, or aggregate before it can determine if there are any errors in a query where aggregates are used."

I am again unsure whether this sentence is - at this stage - without having seen an example with a scalar or ';' separator in it, makes any sense to the reader.

* Section 11.6 

 this section looks a bit displaced here... I think this may make sense further down to explain the functioning of the algebra, but not here.
 In general, I think section 11 should introduce Aggregates and their features, by examples giving data, query and results, 
  whereas their evaluation and examples refering to the algebra should go to section 18.

* Section 11.7

"Note that the bindings for the <y> group is not included in the results as the evaluation of Avg({1, _:b2, 3, 4}), and (_:b2 + 4) / 2 is an error, removing the bindings from the solution."
 -->
"Note that the bindings for the :y group is not included in the results as the evaluation of Avg({1, _:b2, 3, 4}), and (_:b2 + 4) / 2 is an error, removing the bindings from the solution."

* Section 12

 "Note that only variables projected out of the subquery will be visible, or in scope to the outer query." 

please link to definition of in-scope, i.e.
 
 "Note that only variables projected out of the subquery will be visible, or <href="#variableScope">in-scope</a> to the outer query." 

* Example in Section 12

- "Subqueries require one additional algebra operator, ToMultiset, which takes lists and returns multisets."

Again, at this point we haven;t talked about Algebra operators, so I suggest to leave out those details at this point, or add clearer forward pointers. e.g.

 "Subqueries return sequences, i.e. lists of solution, whereas patterns return multisets of solution bindings. In order to handle this, the    
  SPARQL algebra, cf. Section 18.4 below, defines an algebra operators ToMultiset, which takes lists and returns multisets."

- Likewise, we may either drop the following sentence, or add more explanation: 

"In general, GroupGraphPatternSub is evaluated and then the resulting multiset is projected with the Project function, and
handled as per the Converting Solution Modifiers section. The resulting sequence is converted back to a multiset with ToMultiset."

- I'd suggest to change the final Example in section 12 to something including ORDER BY and demonstrate, along with Data and a result that the order is dropped, i.e.

"As a consequence the ordering from any ORDER BY expressions is not propagated outside the subquery.

Example: 

Data:
  @prefix : <http://example.com/data/#> .
  :s :p :a . 
  :s :p :b . 

Query:
 PREFIX : <http://example.com/data/#>
 SELECT ?z WHERE {
  SELECT ?z WHERE {
   ?x ?y ?z .
  } ORDER BY ?z
 }

Result:
  |?z|
  ----
  |:b|
  |:a|

This is a valid result, since the order from the subquery may be dropped, as the subquery becomes  in the SPARQL algebra  ToMultiset(Project(OrderBy(ToList(BGP(?x ?y ?z)),?z), {?z})).
"

* Section 14

 Should this section mention something about the status of this feature, 
 i.e. mention that it is optional? (cf. closed ISSUE-1, i.e. by leaving the grammar in, we haven't really 
 specified it in two separate documents, but have the syntax fully in SPARQL1.1 Query, whereas we've only "outsourced" 
 the semantics of the feature.

* Section 16.2.2

"where app:customDate identified an extension function to turn the data format into an xsd:dateTime RDF term."
-suggestion->
"where app:customDate identified an extension function to turn an application-specific custom date format into an xsd:dateTime 
RDF term."

* Section 18.4

 "Aggregation, a function which calculates a scalar value as an output of the aggregate expression in the SELECT clause, and in the HAVING evaluation process."

 this is not a full sentence, seems there's something missing here? Admittedly, I still don't really get how Aggregation() calls Group() ... etc.  i.e. how i get from lists of functions from keys to solution multisets to the final solution multisets.

* Section 18.5

 In Definition: Evaluation of Aggregation
 "It produces a single value for each key and partition for that key (key, X)."
 
 I am afraid I don't understand. What is (key, X) here? what is particularly X?






* A.2 Other References

 "[TURTLE] Turtle - Terse RDF Triple Language, Dave Beckett."

  as we refer to the Team submission, we should add Tim Berners-Lee as author.

----------------------------------------------------------------
Received on Thursday, 17 March 2011 01:30:00 UTC