Re: SPARQL Query 1.1 review from Andy Seaborne on 2011-02-11 (public-rdf-dawg@w3.org from January to March 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 11 Feb 2011 21:41:05 +0000
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>, Steve Harris <steve.harris@garlik.com>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4D55ACF1.1080203@epimorphics.com>
(Steve - left the "subqueries" to you)

Birte,

Thank you for the time and effort you have put into this review.

On the specific point about "18.6 Extending SPARQL Basic Graph 
Matching", I suggest taking this to another email thread.

 Andy


On 11/02/11 12:36, Birte Glimm wrote:
> Hi Andy, Steve, others,
> here is my review for SPARQL Query 1.1 (apart from Sections 11 and
> 18), although I mention some typos that I noticed while skimming over
> Section 18 and I have a general comment for one subsection too.
>
> First the things that are more substantial in my opinion:
>
> In general, I was a bit confused about what simple literals are. Is
> that the same as plain literals? The spec uses "RDF literals", which
> as I understand it can be any type of literal, and then "plain
> literal" and "simple literal". Is plain the same as simple?  Can the
> notation either be unified or if these are different, can there be a
> definition of what is what?


Sec 17.1 says:

simple literal denotes a plain literal with no language tag.

but this comes after it's been used a few times.

This is a piece of SPARQL terminology - the term isn't defined in the 
RDF specs, only plain literal which covers plain strings with or without 
lang tag.

I've added text to 1.2.4 terminology, along with RDF term, linked to 
definitions in sec 18.

> Another big concern is whether the implementation of property paths are
> optional or not.

I don't believe they are optional.  The only optional feature is 
federated query.

> So far only the evaluation of BGPs (Bgp(...) in the
> algebra) is used to actually compute bindings. All other operations
> then work on solution sequences, which is really nice.

Not quite true - joins etc create new bindings from old.  I think you 
mean that BGPs are the bottom of the binding creation tree.  That was 
true in SPARQL 1.0, but we have with BINDINGS, we have another way to 
create bindings at the bottom level.

By defining property paths as SPARQL expressions where possible, we have 
reduced it to a few special operators.

> Some property
> path features require, however, an algebra extension that introduces
> other forms of computing solutions, e.g., the evaluation of
> :s :p+ ?o
> yields a solution sequence by an extended form of BGP matching, but
> this is not defined for entailment regimes and cannot be defined by
> means of entailment. Thus, queries with certain property paths cannot
> make use of the BGP matching extension point, which introduces in my
> opinion and unfortunate incompatibility, which is not mentioned or
> discussed in the document.
> I am not against property paths, but I think this incompatibility has
> to be mentioned and I would also like to see property path being an
> optional feature, i.e., a SPARQL 1.1 conformant system can but does
> not have to support property paths.

Good idea to have some text, but I think the entailment document is the 
place to put it.  SPARQL, from rq25.xml, is a single unit of features 
over simple entailment.

> Similarly, I am concerned about FILTER EXISTS and FILTER NOT
> EXISTS. MINUS is a proper algebra operator, which combines
> solutions sequences, but having EXISTS and NOT EXISTS defined as
> filters is not in line with how filters previously worked, i.e., by
> working on RDF terms that result from applying given solutions to the
> variables followed by evaluating the filter expression (see also
> description in the itemise of 17.2). FILTER EXISTS and FILTER NOT
> EXISTS require the evaluation of BGPs, so that is quite a different
> thing and would require an algebra translation that somewhere has a
> Bgp(...) element in it, which does not seem to be the case at the
> moment. This might just be the case because Section 18 is not yet
> ready, but even though FILTER [NOT] EXISTS uses the FILTER keyword, it
> might be better to not treat them in the same way as other filters or
> not even use the FILTER keyword at all. I would rather like to see
> them as first class operators like MINUS or OPTIONAL.

<personal opinion>

</personal opinion>


> Here are the mostly minor things:
>
> Status , The new features are: ...
> The link for "Expressions in the SELECT clause" is not working

Done

>
> 1.1 Document Outline
> ...
> Sections 11 incorproated<- incorp*or*ated

Done

>
> 1.2.1 Namespaces
> Entry for snf: http://www.w3.org/ns/sparql#
> has @@(process) Ensure page populated
> The page is actually there and has an entry for #bound. Is more
> needed? If not remove @@...

All the functions in

http://www.w3.org/2009/sparql/wiki/SPARQL_Namespaces

> 2.5 Creating Values with Expressions
> Why is the SELECT clause in the query example indented?
>

Formatting error - fixed.

> 4.1.1.1 Prefixed *N*ames (for consistency)

Done

> 4.1.4 Syntax for Blank Nodes
> "Blank nodes in graph patterns act as non-distinguished variables, not
> as references to specific blank nodes in the data being queried."
> I suggest to just remove non-distinguished as it is confusing and not
> really what SPARQL does.

Done.

> 5.1.2 Extending Basic Graph Pattern Matching
> "SPARQL is defined for matching RDF graphs with simple
> entailment. SPARQL can be extended to other forms of entailment given
> certain conditions as described below."
> This is the only place where simple entailment is mentioned and it
> might come out of context here. I also think it would be good to
> reference the entailment regimes document here.
> How about:
> "SPARQL evaluates basic graph patterns using subgraph matching, which
> can be defined using simple entailment. SPARQL can be extended to
> other forms of entailment given certain conditions as described below
> and<a href="http://www.w3.org/TR/sparql11-entailment/">SPARQL 1.1
> Entailment Regimes</a> do this for several entailment relations."

I think the text still needs to refer the extension framework in rq25.

How about:

"""
SPARQL evaluates basic graph patterns using subgraph matching, which
is defined for simple entailment. SPARQL can be extended to
other forms of entailment given certain conditions [link] as described 
below.  The document SPARQL 1.1 Entailment Regimes [link]
describes several specific entailment regimes.
"""

>
> 9.1 Property Path Expressions
> In this section URI is used, but shouldn't it be IRI?

Done.

> In the forth row (negated property set) I find !^uri not explained by
> the Matches column. Should it just be !uri? I can figure out what it
> is supposed to match, but the Matches text only explains the second
> pattern.

s/and/or/ in that line.

!^iri is a single reverse IRI not to be matched.  Short for !(^iri)

Reworded (and the line above as well).

file:///home/afs/W3C/SPARQL-docs/query-1.1/rq25.xml#pp-language

> Text below the table:
> in a negated property sets<- in a negated property *set*
> Binary operators / and ^: There is no example of ^ as a binary
> operator in the table as far as I can see. Is the binary usage meant
> to be as in Example: Inverse Path Sequence below?

The example is wrong. ^ is unary:  needs to be /^

> First example (Example: Alternatives), I first couldn't figure out
> what is subject, what is predicate and what is object. I suggest no
> spaces around | and/or brackets.

No spaces.

>
> 9.3 Cycles and Duplicates
> First example data should be:
> :x  :p :z1 .
> :x  :p :z2 .
> :z1 *:q* :y .
> :z2 *:q* :y .

Done

> Second example data should be:
> :x  :p *:y* .
> *:y*  :p :x .

Done

>
> 10 Assignment
> Second sentence: "The new variable must not have been used in the
> query up to that point. "
> Is that true?

Yes - but it's written from an execution point of view and not synatx so 
it's opaque or misleading.

Changed to

"""
The value of an expression can be added to a solution mapping by binding
a new variable to the value of the expression, which is an RDF term. In 
SPARQL, this binding within a query solution is never changed and this 
is checked by the variable scoping rules [link]. The new variable must 
not already be in-scope in the query at that point it is used. The 
variable can then be used in the query and also can be returned
in results.
"""

> Later in
> 19.8 Grammar, Notes:
> 11. The variable assigned in a BIND clause must not be already
> in-scope.

@@

>
> 10.1 BIND: Assigning to *V*ariables (for consistency)

Done

> The BIND form allows *a* value (not a*n* value)

Done

> "Use of BIND is a separate element of a group graph pattern and it ends
> any basic graph pattern, including ending the scope
> of any filters."
> I find "including ending the scope of any filters." confusing.

It's wrong as well :-)

FILTERs extend to the group (from SPARQL 1.0).

> In the
> example below, the filter applies to the whole group (as usual), but
> does this note mean that if the filter were positioned before the
> BIND, then it would just apply to the elements before? For example,
> would:
> {  ?x ns:price ?p .
>     ?x ns:discount ?discount
>     FILTER(?p<  20)
>     BIND (?p*(1-?discount) AS ?price)
>     ?x dc:title ?title .
> }
> be translated somehow such that
> Filter((?p<  20), Bgp(?x ns:price ?p . ?x ns:discount ?discount))
> is then extended according to the BIND part and then joined with the
> last triple pattern?

There are two BGPs, before and after the BIND:

       (filter (< ?price 20)
         (join
           (extend ((?price (* ?p (- 1 ?discount))))
             (bgp
               (triple ?x ns:price ?p)
               (triple ?x ns:discount ?discount)
             ))
           (bgp (triple ?x dc:title ?title))))

Try it at:
http://www.sparql.org/query-validator.html


> 10.2 BINDINGS
> ...o send a more *constrained* query to a remote query service. (not
> constrainded)

Done.


<@@Steve>

> 12 Subqueries, after the data for the first example:
> "Return a name (the one with the lowest sort order) from all the
> people that know Alice and have a name."
> Isn't the query rather asking for people that Alice knows? How about:
> "Return a name (the one with the lowest sort order) for all the
> people that Alice knows and who have a name."
>
> at the end of the example:
> Subqueries require one additional algebra operator, ToMultiset, which
> takes *l*ists and returns *m*ultisets.
> I don't see a reason to put list and multiset in upper case since
> these terms do not refer to any function in this context.
>
> at the end of the section:
> Only variables projected by the Project function are visible to
> operations outside the ToMultiset call.
> <code>ToMultiset</code>  (for consistency)

</@@Steve>


>
> 16.1.2 SELECT *E*xpressions

Done

>
> 16.2.4 CONSTRUCT WHERE
> (no FILTERs and *no* complex graph patterns are allowed in the short
> form)

Done

> 16.4.2 Identifying Resources
> The property foaf:mbox is defined as being an inverse function*al*
> property in the FOAF vocabulary.

Done

>
> 17 Expressions and Testing Values
> still has: @@(editorial) Expressions, not just testing values

Yes - reminder to me to check.  In SPARQL 1.0 expressions only appeared 
in FILTERS for testing.

> "SPARQL FILTERs restrict the solutions of a graph pattern match
> according to a given expression. "
> expression is linked to the expression grammar element, but in fact
> filters are followed by the grammar element condition.

Done

> 17.2 Filter Evaluation
> First itemise:
> "Apart from BOUND, all functions and operators operate on RDF Terms
> and will produce a type error if any arguments are unbound."
> This seems not true if FILTER EXISTS and FILTER NOT EXISTS are indeed
> realised as filters because they require BGP matching, so don't just
> operate on RDF terms.

Done - added NOT EXISTS and EXISTS

> 17.2.2 Effective Boolean Value (EBV)
> "... The following rules reflect the rules for fn:boolean applied to
> the argument types present in SPARQL Queries:"
> I don't see why Queries in SPARQL Queries is upper case.

Changed.

> 17.3 Operator Mapping
> "This table is not up to date. IN, NOT IN, BNODE, IF, COLAESCE, IRI,
> URI, STRDT, STRLANG, NOT EXISTS, EXISTS"
> This should be updated for LC (also COALESCE, not COLAESCE).

Removed - the table does not have the functions any more, just the 
operators (too many functions).

> 17.4 Operator and Function Definitions
> Still has:
> @@URIs: sfn:bound etc.
> @@Clean prototypes.
> This should be updated for LC.

Yes.

>
> 17.4.2.7 datatype
> "Returns the datatype IRI of typedLit; returns xsd:string if the
> parameter is a simple literal."
> Here in particular I am confused about what simple literals are. In
> OWL 2, "abc" has datatype rdf:PlainLiteral and not xsd:string as I
> understand it. Would "abc" in SPARQL be a simple literal and have
> datatype xsd:string? What is then a plain literal in SPARQL? Is there
> any syntactic difference?

"abc" is a simple literal. It does not have a dartatype IRI.  It is (by 
RDF-MT xsd 1a, 1b) the same value as "abc"^^xsd:string.

rdf:PlainLiteral must not appear in RDF as datatype IRIs
http://www.w3.org/TR/rdf-plain-literal/#Syntax_for_rdf:PlainLiteral_Literals

There is a section on extended SPARQL basic graph matching atthat 
reference you might want to check.

If instead of rdf:plainLiteral, there were defined datatypes for each 
language tag, we'd be in a better position.

> 17.4.2.8 IRI
> still has: @@ Do we also need IRI(relStr, baseStrOrIRI)?
> Should be fixed for LC.

Yes.  Added to To_Last_Call

(Use case: generating IRIs based on some id value).

> 17.4.2.11 STRDT
> What happens if a given lexical form is invalid, e.g., STRDT("abc",
> xsd:integer)? Does that result in an error or in "abc"^^xsd:integer?
> For other functions there is an explicit note about errors and it would
> be good to have such a note also for STRDT.


STRDT simple takes a lexical form and a datatype IRI and returns the RDF 
literal so formed.  In RDF, "abc"^^xsd:integer is a legal literal, so no 
error.

> I know Section 18 is not ready for review yet, but here are just some
> typos:
> 18.2.4 Converting Groups, Aggregates and SELECT *E*xpressions

Done

> 18.2.2.3 Translate Basic Graph Patterns and Filters
> After translating property paths, any adjacent triple patterns are
> *collected* (not colelctied) together to form a basic graph pattern
> *BGP(triples)* (not BGP(triples>).

Done

> and one other comment:
> 18.6 Extending SPARQL Basic Graph Matching
> I am not quite happy with the text (in particular the formulation of
> the conditions) since it is not at all well-aligned with the notation
> used in the rest of the document, e.g., "answer set" is everywhere
> else "solution sequence" and in this case answer set is even a set of
> pattern instance mappings, which is not the case anywhere else, where
> a BGP evaluates into a multiset of solution mappings and the RDF
> instance mappings just determine the multiplicity.
> We (Markus Krötzsch and I) discuss what is wrong with the conditions in
> an ISWC paper and I am happy to suggest a more aligned version of the
> conditions, if he WG is interested in this.

Please do - and I'd like to take this as a separate thread as it's area 
several people have an interest in who may not have read down to here.


 Andy
Received on Friday, 11 February 2011 21:41:47 UTC