Re: Comment on SPARQL CR from Seaborne, Andy on 2006-05-15 (public-rdf-dawg-comments@w3.org from May 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 15 May 2006 17:22:26 +0100
To: Olivier Corby <Olivier.Corby@sophia.inria.fr>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <4468AAC2.6040404@hp.com>
[All:: Apologies for delays - other commitments, and conferences, mean I'm not 
back to full editing until the end of May - Andy]

Olivier Corby wrote:
> I have some comments on the SPARQL CR.
> 
> Best regards,
> 
> Olivier

Thank you very much for your careful review -- changes in the editors working 
draft as described below.

	Andy

> 
> In the table of content, this link fails :
> 4.3 Evaluation Order

Fixed.

> In the Introduction, RDFS is not mentionned at all, one may wonder why.

The charter talks in terms of RDF query.

> 2.1.8
> The term "binding" is used as a descriptive term to refer to a pair of 
> (variable, RDF term).

The term "binding" is used elsewhere in the document so it is useful to 
describe it here.

> I would say (removing the of) :    refer to a pair  (variable, RDF term).

It is a pair - but with the emphasis on the associating the value with the 
variable in one solution.

>   2.5
>   An E-entailment regime is a binary relation between subsets of RDF graphs.
> 
>   Why not just sets of RDF graphs ?

An E-entailment regime does not necessary give a relationship between any two 
graphs - it only works between some subset of all graphs and another subset of 
all graphs.

>   2.5.4
>   with the scope of the blank node label being the basic graph pattern.
> 
>   This sentence is misleading because in general the scope of a blank 
> node label is the whole query.

This scope of a blank node label is the basic graph pattern.  This was 
clarified quite late so if you find text elsewhere to the contrary, then that 
text is wrong.

>   2.5.7
>   Blank nodes in the results of a query are identical to those occurring 
> in the dataset graphs, but this information cannot be used by an 
> application or client which receives these results, since all blank 
> nodes in subsequent queries are treated as being local to that query.
> 
>   I would say (but I am not sure) :
> 
>   since all blank nodes in subsequent query results are treated as being 
> local to each query result.

It's the use of blank nodes in queries and their relationship to entailment 
that really stops the blank node being used a a constant of the query (aside 
from issues about local scope of labels in results). The current text is 
trying to reflect that.

> 
>   2.8.4 RDF Collections
> 
>   The list pattern generates a closed list which systematically ends 
> with rdf:nil
> 
>   It could be possible to consider open list and closed list patterns.
> 
>   A closed list pattern could end with . (new syntax) and generate a 
> list that ends with rdf:nil, as in the spec :
>   (1 ?x 3 4 .)
> 
> an open list pattern (1 ?x 3 4) does not end with rdf:nil.

An RDF list does end in a rdf:nil element by defintion of RDF collections.

The SPARQL syntax is taken from N3 / Turtle so variations might be considered 
confusing.  I can see value in a syntax that only describes part of a list . 
Just describing the list head is only part of any requirements in a design 
(e.g. why not membership of a list or the tail or a subsection?

The WG postponed the issue of list access :
http://www.w3.org/2001/sw/DataAccess/issues#accessingCollections
for a later working group.

and some systems are exploring SPARQL-compatible ways of doing it:

e.g. A triple pattern like:   ?list list:member ?member


> 
> 5 Optionnal pattern
> 
> Suppose the following case :
> 
> Data:
> 
> _:a foaf:name "Jules"
> _:a foaf:friend _:b
> _:b foaf:mbox "jules@nowhere.com"
> 
> _:c foaf:mbox "jim@somewhere.com"
> 
> 
> 
> Query:
> 
> select ?x ?m where {
> ?x foaf:name ?n .
> { optional { ?x foaf:friend ?y } } .
> { optional { ?y foaf:mbox ?m } }
> }
> 
> 
> 
> It is not completely clear to me from the spec whether this is a solution :
> 
>   ?x          ?m
> _:a    "jim@somewhere.com"
> 
> 
> On one hand, the first optional binds ?y to _:b and hence the second 
> optional has only one solution. But on another hand, the first optional 
> is optional, hence the second optional can have its chance as the spec 
> says that "There is no implied order of graph patterns within a Group 
> Graph Pattern."

This query is a syntactically a group of one element: OPTIONAL is defined as a 
binary operator so the query above has the second OPTIONAL acting on the first 
optional.

  ( P1 OPT P2 ) OPT P2

In earlier working drafts, it was different but this way any query has some 
consistent meaning including cases that when OPTIONAL was not binary had to be 
excluded.  The WG descided on the current definition of OPTIONAL and removed 
the restrictions on variable use (here the ?y) because it always has a 
consistent meaning.

In this case: The first optional is evaluated to get ?y = _:b which becomes 
the left hand side (the fixed part) of the second optional so ?x = _:a ?m = 
"jim@somewhere.com" is not a solution.


> 6.2
> 
> Query results involving a pattern containing GP1 and GP2 will include 
> separate solutions for each match where GP1 and GP2 give rise to 
> different sets of bindings.
> 
> If sets of bindings are not different, what happens ?

That may be duplicates - but it depends on the implementation.

It can be forced unique by using the DISTINCT keyword in the SELECT clause (it 
makes no difference in CONSTRUCT etc).


A SPARQL pattern yields something that is to be interpreted as a set of 
solutions.  Because of implementation feedback, it was decided to allow 
duplicates or suppressed duplicates but not require either model.  Some 
implementations, running over an entailment engine may produce very many 
duplicates so will typically suppress them, a straight lookup and project from 
a table of triples would have far less and it may be useful to allow them through.

> 
> 8.4
> 
> of two, named graphs -> two named graphs

Changed.

> 10
> 
> The query forms are -> The query result forms are

Fixed.

> SELECT
> Returns all, or a subset of, the variables bound in a query pattern match.
> 
> A variable may be not bound in the result ...
> 
> 
> 10.1
> 
> The elements of a sequence of solutions can be modified by:
> 
>     1. ORDER BY: put the solutions in order
>     2. Projection
>     3. DISTINCT: ensure solutions in the sequence are unique.
>     4. LIMIT: restrict the number of solutions processed for query results
>     5. OFFSET: control where the solutions processed start from in the 
> overall sequence of solutions.
> 
> applied in the order given by the list.
> 
> 
> I would put OFFSET before LIMIT in the list if they are to be applied in 
> the order.
> 

Firstly note the order is:

Projection modifier
Distinct modifier
Order modifier
Limit modifier
Offset modifier

LIMIT affects the total to be returned, and OFFSET the start point and is that 
way round in the SPARQL syntax and in SQL.

I agree that it could be the other way round.


> 10.1.1
> 
>   project(S, VS) = { (project(Si, VS) | i = 1,2, . . . n }
> 
>   a space is missing between 1 and 2

Fixed.

> 
> 10.1.3
> 
>   An ordering condition can be a variable or a function call.
> 
> The grammar says that it can also be a BrackettedExpression

Changed to add " or an expression".

> Definition:  Ordered Solution Sequence
> for Si, SJ
> 
> the J is in cap letter, should be Sj

Fixed

> 11.1
> 
> datatype IRI (corresponds to the Concepts and Abstract Syntax term 
> "datatype IRI")
> 
> Isn't it "datatype URI" (second occurrence)

Yes - fixed.

> 
> 11.4.8, 11.4.9
> 
> the the (twice)

Fixed (twice :)
Received on Monday, 15 May 2006 16:22:48 UTC