SPARQL Query 1.1 review from Birte Glimm on 2011-02-11 (public-rdf-dawg@w3.org from January to March 2011)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Fri, 11 Feb 2011 12:36:03 +0000
To: SPARQL Working Group <public-rdf-dawg@w3.org>, Andy Seaborne <andy.seaborne@epimorphics.com>, Steve Harris <steve.harris@garlik.com>
Message-ID: <AANLkTim=JJmeKEPwYtRXWVPez9jqtHeimHoAvUVngQju@mail.gmail.com>
Hi Andy, Steve, others,
here is my review for SPARQL Query 1.1 (apart from Sections 11 and
18), although I mention some typos that I noticed while skimming over
Section 18 and I have a general comment for one subsection too.

First the things that are more substantial in my opinion:

In general, I was a bit confused about what simple literals are. Is
that the same as plain literals? The spec uses "RDF literals", which
as I understand it can be any type of literal, and then "plain
literal" and "simple literal". Is plain the same as simple?  Can the
notation either be unified or if these are different, can there be a
definition of what is what?

Another big concern is whether the implementation of property paths are
optional or not. So far only the evaluation of BGPs (Bgp(...) in the
algebra) is used to actually compute bindings. All other operations
then work on solution sequences, which is really nice. Some property
path features require, however, an algebra extension that introduces
other forms of computing solutions, e.g., the evaluation of
:s :p+ ?o
yields a solution sequence by an extended form of BGP matching, but
this is not defined for entailment regimes and cannot be defined by
means of entailment. Thus, queries with certain property paths cannot
make use of the BGP matching extension point, which introduces in my
opinion and unfortunate incompatibility, which is not mentioned or
discussed in the document.
I am not against property paths, but I think this incompatibility has
to be mentioned and I would also like to see property path being an
optional feature, i.e., a SPARQL 1.1 conformant system can but does
not have to support property paths.

Similarly, I am concerned about FILTER EXISTS and FILTER NOT
EXISTS. MINUS is a proper algebra operator, which combines
solutions sequences, but having EXISTS and NOT EXISTS defined as
filters is not in line with how filters previously worked, i.e., by
working on RDF terms that result from applying given solutions to the
variables followed by evaluating the filter expression (see also
description in the itemise of 17.2). FILTER EXISTS and FILTER NOT
EXISTS require the evaluation of BGPs, so that is quite a different
thing and would require an algebra translation that somewhere has a
Bgp(...) element in it, which does not seem to be the case at the
moment. This might just be the case because Section 18 is not yet
ready, but even though FILTER [NOT] EXISTS uses the FILTER keyword, it
might be better to not treat them in the same way as other filters or
not even use the FILTER keyword at all. I would rather like to see
them as first class operators like MINUS or OPTIONAL.


Here are the mostly minor things:

Status , The new features are: ...
The link for "Expressions in the SELECT clause" is not working

1.1 Document Outline
...
Sections 11 incorproated <- incorp*or*ated

1.2.1 Namespaces
Entry for snf: http://www.w3.org/ns/sparql#
has @@(process) Ensure page populated
The page is actually there and has an entry for #bound. Is more
needed? If not remove @@...

2.5 Creating Values with Expressions
Why is the SELECT clause in the query example indented?

4.1.1.1 Prefixed *N*ames (for consistency)

4.1.4 Syntax for Blank Nodes
"Blank nodes in graph patterns act as non-distinguished variables, not
as references to specific blank nodes in the data being queried."
I suggest to just remove non-distinguished as it is confusing and not
really what SPARQL does.

5.1.2 Extending Basic Graph Pattern Matching
"SPARQL is defined for matching RDF graphs with simple
entailment. SPARQL can be extended to other forms of entailment given
certain conditions as described below."
This is the only place where simple entailment is mentioned and it
might come out of context here. I also think it would be good to
reference the entailment regimes document here.
How about:
"SPARQL evaluates basic graph patterns using subgraph matching, which
can be defined using simple entailment. SPARQL can be extended to
other forms of entailment given certain conditions as described below
and <a href="http://www.w3.org/TR/sparql11-entailment/">SPARQL 1.1
Entailment Regimes</a> do this for several entailment relations."

9.1 Property Path Expressions
In this section URI is used, but shouldn't it be IRI?
In the forth row (negated property set) I find !^uri not explained by
the Matches column. Should it just be !uri? I can figure out what it
is supposed to match, but the Matches text only explains the second
pattern.
Text below the table:
in a negated property sets <- in a negated property *set*
Binary operators / and ^: There is no example of ^ as a binary
operator in the table as far as I can see. Is the binary usage meant
to be as in Example: Inverse Path Sequence below?
First example (Example: Alternatives), I first couldn't figure out
what is subject, what is predicate and what is object. I suggest no
spaces around | and/or brackets.

9.3 Cycles and Duplicates
First example data should be:
:x  :p :z1 .
:x  :p :z2 .
:z1 *:q* :y .
:z2 *:q* :y .

Second example data should be:
:x  :p *:y* .
*:y*  :p :x .

10 Assignment
Second sentence: "The new variable must not have been used in the
query up to that point. "
Is that true? Later in
19.8 Grammar, Notes:
11. The variable assigned in a BIND clause must not be already
in-scope.

10.1 BIND: Assigning to *V*ariables (for consistency)
The BIND form allows *a* value (not a*n* value)
"Use of BIND is a separate element of a group graph pattern and it ends
any basic graph pattern, including ending the scope
of any filters."
I find "including ending the scope of any filters." confusing. In the
example below, the filter applies to the whole group (as usual), but
does this note mean that if the filter were positioned before the
BIND, then it would just apply to the elements before? For example,
would:
{  ?x ns:price ?p .
   ?x ns:discount ?discount
   FILTER(?p < 20)
   BIND (?p*(1-?discount) AS ?price)
   ?x dc:title ?title .
}
be translated somehow such that
Filter((?p < 20), Bgp(?x ns:price ?p . ?x ns:discount ?discount))
is then extended according to the BIND part and then joined with the
last triple pattern?

10.2 BINDINGS
...o send a more *constrained* query to a remote query service. (not
constrainded)

12 Subqueries, after the data for the first example:
"Return a name (the one with the lowest sort order) from all the
people that know Alice and have a name."
Isn't the query rather asking for people that Alice knows? How about:
"Return a name (the one with the lowest sort order) for all the
people that Alice knows and who have a name."

at the end of the example:
Subqueries require one additional algebra operator, ToMultiset, which
takes *l*ists and returns *m*ultisets.
I don't see a reason to put list and multiset in upper case since
these terms do not refer to any function in this context.

at the end of the section:
Only variables projected by the Project function are visible to
operations outside the ToMultiset call.
<code>ToMultiset</code> (for consistency)

16.1.2 SELECT *E*xpressions

16.2.4 CONSTRUCT WHERE
(no FILTERs and *no* complex graph patterns are allowed in the short
form)

16.4.2 Identifying Resources
The property foaf:mbox is defined as being an inverse function*al*
property in the FOAF vocabulary.

17 Expressions and Testing Values
still has: @@(editorial) Expressions, not just testing values

"SPARQL FILTERs restrict the solutions of a graph pattern match
according to a given expression. "
expression is linked to the expression grammar element, but in fact
filters are followed by the grammar element condition.

17.2 Filter Evaluation
First itemise:
"Apart from BOUND, all functions and operators operate on RDF Terms
and will produce a type error if any arguments are unbound."
This seems not true if FILTER EXISTS and FILTER NOT EXISTS are indeed
realised as filters because they require BGP matching, so don't just
operate on RDF terms.

17.2.2 Effective Boolean Value (EBV)
"... The following rules reflect the rules for fn:boolean applied to
the argument types present in SPARQL Queries:"
I don't see why Queries in SPARQL Queries is upper case.

17.3 Operator Mapping
"This table is not up to date. IN, NOT IN, BNODE, IF, COLAESCE, IRI,
URI, STRDT, STRLANG, NOT EXISTS, EXISTS"
This should be updated for LC (also COALESCE, not COLAESCE).

17.4 Operator and Function Definitions
Still has:
@@URIs: sfn:bound etc.
@@Clean prototypes.
This should be updated for LC.

17.4.2.7 datatype
"Returns the datatype IRI of typedLit; returns xsd:string if the
parameter is a simple literal."
Here in particular I am confused about what simple literals are. In
OWL 2, "abc" has datatype rdf:PlainLiteral and not xsd:string as I
understand it. Would "abc" in SPARQL be a simple literal and have
datatype xsd:string? What is then a plain literal in SPARQL? Is there
any syntactic difference?

17.4.2.8 IRI
still has: @@ Do we also need IRI(relStr, baseStrOrIRI)?
Should be fixed for LC.

17.4.2.11 STRDT
What happens if a given lexical form is invalid, e.g., STRDT("abc",
xsd:integer)? Does that result in an error or in "abc"^^xsd:integer?
For other functions there is an explicit note about errors and it would
be good to have such a note also for STRDT.

I know Section 18 is not ready for review yet, but here are just some
typos:
18.2.4 Converting Groups, Aggregates and SELECT *E*xpressions
18.2.2.3 Translate Basic Graph Patterns and Filters
After translating property paths, any adjacent triple patterns are
*collected* (not colelctied) together to form a basic graph pattern
*BGP(triples)* (not BGP(triples>).

and one other comment:
18.6 Extending SPARQL Basic Graph Matching
I am not quite happy with the text (in particular the formulation of
the conditions) since it is not at all well-aligned with the notation
used in the rest of the document, e.g., "answer set" is everywhere
else "solution sequence" and in this case answer set is even a set of
pattern instance mappings, which is not the case anywhere else, where
a BGP evaluates into a multiset of solution mappings and the RDF
instance mappings just determine the multiplicity.
We (Markus Krötzsch and I) discuss what is wrong with the conditions in
an ISWC paper and I am happy to suggest a more aligned version of the
conditions, if he WG is interested in this.


-- 
Dr. Birte Glimm, Room 309
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283520
Received on Friday, 11 February 2011 12:36:37 UTC