Re: SPARQL Query review from Steve Harris on 2011-03-28 (public-rdf-dawg@w3.org from January to March 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Mon, 28 Mar 2011 12:31:11 +0100
To: "Polleres, Axel" <axel.polleres@deri.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <43851853-4E40-44EA-ACB2-B05AF805CB8E@garlik.com>
On 2011-03-17, at 01:29, Polleres, Axel wrote:
> * Section 11.1
> 
> - 
> "In aggregate queries and sub-queries variables that appear in the query
> pattern, but are not in the GROUP BY clause cannot be projected nor used in project
> expressions." 
> 
> I'd suggest to add a forward pointer to Section 11.4 here, which explains this aspect again in more detail, i.e.:
> 
> "In aggregate queries and sub-queries variables that appear in the query
> pattern, but are not in the GROUP BY clause cannot be projected nor used in project
> expressions; in order to project arbitrary expressions the SAMPLE
> aggregate may be used. For details, cf. <a href=...>Section 11.4</a> below."

Makes sense, done.

> - 
> "It should be noted that as per functions, aggregate expressions MUST be aliased in order to project them from queries or subqueries."
> 
> at this stage ofd the doc it's not clear what "aliased" means, so I suggest to say it explicitly:
> 
> "It should be noted that as per functions, aggregate expressions MUST be aliased (again, similar to the BIND clause, using the keyword 'AS') in order to project them from queries or subqueries."

Agreed.

> * Section 11.2
> 
> - In the example:  
> 
>  in Section 18.5 Definition: Evaluation of Aggregation, Aggregation is defined as  
>     Aggregation(exprlist, func, scalarvals, P) where I assume P   is a pattern. 
>  However, in the example here, you write:  Aggregation((?y), Avg, {}, G) where G is a Group.

That's already been fixed.

>  Also Avg() is a function over a multiset, so I don't really understand what (2) -> Avg({(3), (5)}, 0, {}), means (I suppose that's a leftover from when we wanted to count errors?)

Yes, thanks, fixed.

> - Next:
> 
> "Note that the empty set is used in some aggregates to pass scalar arguments to the set function."
> 
> I am afraid, at this point, the reader doesn't know nor understand what *scalar arguments* are, so I am afraid this sentence is not understandable here, unless we add an example which uses scalar arguments and explains them. I'd suggest to add that.

OK, done.

> * Section 11.3
> 
> An example for HAVING should be added. 

Agreed, done. Could someone who has a parsing implementation check the syntax please?

> * Section 11.4
> 
> An example using SAMPLE would help here in the end. Ideally, all the examples in this section should also have data and results.

I'm reluctant to add data and results everywhere, as the document is already huge. The testcases ought to cover this. Is there a precedent for pointing to specific queries in the testcase document? Is it stable enough to do that?

> * Section 11.5 
> 
> - It seems we use "scalar value" and "scalar argument" interchangeably, but not uniformly, the temriniology should be consistent (or if that are two different things it should be explained before it is used the first time)

I think this has already been fixed.

> - "Definitions may be found in the SPARQL Algebra section."
> suggest to number references to different sections, i.e. 
>  "Definitions may be found in  section 18.4 SPARQL Algebra, below."

I don't like referring to numbered sections, the numbering isn't completely stable.

> - "Note that, unless the ; separator is used this requires the parser to know whether some IRI refers to a function, cast, or aggregate before it can determine if there are any errors in a query where aggregates are used."
> 
> I am again unsure whether this sentence is - at this stage - without having seen an example with a scalar or ';' separator in it, makes any sense to the reader.

I've added an example.

> * Section 11.6 
> 
> this section looks a bit displaced here... I think this may make sense further down to explain the functioning of the algebra, but not here.
> In general, I think section 11 should introduce Aggregates and their features, by examples giving data, query and results, 
>  whereas their evaluation and examples refering to the algebra should go to section 18.

Without this, the AggregateJoin algebra serves no obvious purpose.

> * Section 11.7
> 
> "Note that the bindings for the <y> group is not included in the results as the evaluation of Avg({1, _:b2, 3, 4}), and (_:b2 + 4) / 2 is an error, removing the bindings from the solution."
> -->
> "Note that the bindings for the :y group is not included in the results as the evaluation of Avg({1, _:b2, 3, 4}), and (_:b2 + 4) / 2 is an error, removing the bindings from the solution."

Fixed.

> * Section 12
> 
> "Note that only variables projected out of the subquery will be visible, or in scope to the outer query." 
> 
> please link to definition of in-scope, i.e.
> 
> "Note that only variables projected out of the subquery will be visible, or <href="#variableScope">in-scope</a> to the outer query."

That's been done by the look of it.

- Steve

> * Example in Section 12
> 
> - "Subqueries require one additional algebra operator, ToMultiset, which takes lists and returns multisets."
> 
> Again, at this point we haven;t talked about Algebra operators, so I suggest to leave out those details at this point, or add clearer forward pointers. e.g.
> 
> "Subqueries return sequences, i.e. lists of solution, whereas patterns return multisets of solution bindings. In order to handle this, the    
>  SPARQL algebra, cf. Section 18.4 below, defines an algebra operators ToMultiset, which takes lists and returns multisets."
> 
> - Likewise, we may either drop the following sentence, or add more explanation: 
> 
> "In general, GroupGraphPatternSub is evaluated and then the resulting multiset is projected with the Project function, and
> handled as per the Converting Solution Modifiers section. The resulting sequence is converted back to a multiset with ToMultiset."
> 
> - I'd suggest to change the final Example in section 12 to something including ORDER BY and demonstrate, along with Data and a result that the order is dropped, i.e.
> 
> "As a consequence the ordering from any ORDER BY expressions is not propagated outside the subquery.
> 
> Example: 
> 
> Data:
>  @prefix : <http://example.com/data/#> .
>  :s :p :a . 
>  :s :p :b . 
> 
> Query:
> PREFIX : <http://example.com/data/#>
> SELECT ?z WHERE {
>  SELECT ?z WHERE {
>   ?x ?y ?z .
>  } ORDER BY ?z
> }
> 
> Result:
>  |?z|
>  ----
>  |:b|
>  |:a|
> 
> This is a valid result, since the order from the subquery may be dropped, as the subquery becomes  in the SPARQL algebra  ToMultiset(Project(OrderBy(ToList(BGP(?x ?y ?z)),?z), {?z})).
> "
> 
> * Section 14
> 
> Should this section mention something about the status of this feature, 
> i.e. mention that it is optional? (cf. closed ISSUE-1, i.e. by leaving the grammar in, we haven't really 
> specified it in two separate documents, but have the syntax fully in SPARQL1.1 Query, whereas we've only "outsourced" 
> the semantics of the feature.
> 
> * Section 16.2.2
> 
> "where app:customDate identified an extension function to turn the data format into an xsd:dateTime RDF term."
> -suggestion->
> "where app:customDate identified an extension function to turn an application-specific custom date format into an xsd:dateTime 
> RDF term."
> 
> * Section 18.4
> 
> "Aggregation, a function which calculates a scalar value as an output of the aggregate expression in the SELECT clause, and in the HAVING evaluation process."
> 
> this is not a full sentence, seems there's something missing here? Admittedly, I still don't really get how Aggregation() calls Group() ... etc.  i.e. how i get from lists of functions from keys to solution multisets to the final solution multisets.
> 
> * Section 18.5
> 
> In Definition: Evaluation of Aggregation
> "It produces a single value for each key and partition for that key (key, X)."
> 
> I am afraid I don't understand. What is (key, X) here? what is particularly X?
> 
> 
> 
> 
> 
> 
> * A.2 Other References
> 
> "[TURTLE] Turtle - Terse RDF Triple Language, Dave Beckett."
> 
>  as we refer to the Team submission, we should add Tim Berners-Lee as author.
> 
> ----------------------------------------------------------------
> 
> 
> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Monday, 28 March 2011 11:31:47 UTC