Review of Query document from Gregory Williams on 2010-09-20 (public-rdf-dawg@w3.org from July to September 2010)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Mon, 20 Sep 2010 18:18:01 -0400
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Cc: Steve Harris <steve.harris@garlik.com>, Andy Seaborne <andy.seaborne@epimorphics.com>
Message-Id: <BD4BDA84-0C42-46C9-9E62-1CC8747B5708@evilfunhouse.com>
I've based the review on the sections Andy highlighted in a previous email as having been changed. His sections are quoted below (with some additional section headers added for the federation document).

.greg



> > Material different from SPARQL 1.0:
> > 
> > 8 Negation
> >    8.1 Filtering Using Graph Patterns
> >        8.1.1 Testing For the Absence of a Pattern
> >        8.1.2 Testing For the Presence of a Pattern
> >    8.2 Removing bindings
> >    8.3 Relationship and difference between NOT EXISTS and MINUS

I find the naming of 8.1 and 8.2 to be asymmetrical. 8.2 seems like it's more about removing query solutions, not variable bindings. Section 8.2 also says, "calculates solutions in one side that are not compatible with the other side" but this isn't exactly true. The text needs to explicitly call out that it's not just compatibility but also a non-empty domain intersection that is required for the result removal. The specifics of this are in 17.4. However, I think the wording in 17.4 is misleading:

"""
The additional restriction on dom(μ) and dom(μ') is added so that if any solution mapping has no variables in common with solution mappings of Ω1 then Minus(Ω1, Ω2) is empty, regardless of the rest of Ω2.
"""

But "then Minus(Ω1, Ω2) is empty" should really be "then Minus(Ω1, Ω2) is NOT empty," distinguishing it from the case that would result in Minus just determining compatability without concern for the domains of Ω1 and Ω2. Also, the cardinality definition for Minus seems to say that it's the same as the cardinality of the Minus operation's left-hand side, but that doesn't seem right.

> > 9 Property Paths

The first sentence of section 9 seems to be missing its ending period.

Outside of the grammar rules interspersed throught the document, the word "property" is never used in relation to the predicate of a triple(pattern) before the property paths section. This may make the text confusing in discussing property paths as "similar to a string regular expression but over properties, not characters". Maybe the relation between properties and predicates is obvious enough to not need clarification, but I thought it worth noting.

There's a free-floating sentence fragment just before section 9.1: "any given path expression."


> >    9.1 Property Path Expressions

"uri is either a URI or a prefixed name" -- is the 'a' shorthand for rdf:type allowed?


I think "nor uri_{j+1}...^uri_n as reverse paths" is missing a ^ on uri_{j+1}.

> >    9.2 Examples

"or, with explicit variables" might be more clear with text indicating that the extra variables make this not exactly the same query as the results from this variant will include bindings for the new named variables.

"rdf;type" should be "rdf:type"


> > These next sections contain material that will be moved into the formal definitions:
> > 
> >    9.3 Algebra for Property Paths
> >        9.3.1 Translation of Property Paths
> >        9.3.2 Material for the "Initial Definitions" section
> >            9.3.2.1 Property Paths Patterns

"A Property Path is a sequence of triples, ti in ST" -- what is "ST"?

"We call the object of t_n the end of the path." -- should n here instead be length(ST)-1?

> >        9.3.3 Property Path Expression Matching
> >            9.3.3.1 SPARQL Property Path Pattern Matching
> > 
> >        9.3.4 Material for the SPARQL syntax to SPARQL algebra section
> >        9.3.5 Material for the SPARQL algebra section

"A zero length path matches subjects and all objects ..." Does this mean "all subjects and objects"? Not sure why the choice of "subjects" vs. "*all* objects".

"such that a intermediate nodes in the graph are trarversed once only" -- 'a' should be 'all'?


> >        9.3.6 Material for the SPARQL Evaluation Semantics section
> >            9.3.6.1 Zero-length paths

Is there a reason to include 'path' in ZeroLengthPath(X, path, Y) ?

"{ { (y, x) } }" -- should be "{ { (vy, x) } }"


"""
eval(D(G), ZeroLengthPath(x:term, path, y:term)) = 
  { {} }
  card[] = 1
"""
-- should this return the empty set {} instead of the set of one empty result { {} } when x != y?


> >            9.3.6.2 Arbitrary length paths

"deteching" -- sp. "detecting".

"ALP(x:term, path) = ALP(x:term, path), {})" -- has one too many right-parens in it. Presumably should be "ALP(x:term, path) = ALP(x:term, path, {})"

"R = R + t + ALP(t, Path, Visted)" -- should 't' be '{t}'? (assuming + is understood as set-union?)


"multiset-union({(vx,t} | | t in nodes(G) and y in ArbitraryLengthPath(t, path, y) })" -- syntax here is messed up, and I'm not sure about the second (recursive?) condition 'y in ArbitraryLengthPath(t,path,y). Maybe meant to be "t in nodes(G) and y in ALP(t,path)" ?


> > 10 Aggregates

This section should mention somewhere that aggregates must be aliased in order to project them. A brief mention with a link to 15.1.2 SELECT expressions would be sufficient.

> >    10.1 Aggregate Example

The introductory text seems a bit thin for readers that may not already be familiar with aggregates. Similarly, the example in 10.1 might be aided by some explanatory text detailing how the final result is arrived at.


> >    10.2 Algebra Operators

"Aggregation, a function which calculates a scalar value as an output of the aggregate expression in the SELECT clause." -- aggregate expressions can also be in the HAVING clause, right (not just SELECT)?

The 'scalar' argument to Aggregation() is said to be a set, but in the example Aggregation() is called with a value of 0.

The second argument to the call to the aggregate set function 'func' is defined as card[range(g)] - card[M], but the use of this value isn't discussed until later in 10.2.2, and then only vaguely. It is never used in the definitions of the set functions.


> >        10.2.1 HAVING
> >        10.2.2 Set Functions

Just before the definition of Sum, the example says the result should be "6.0 (decimal)", but it should be "6.0 (float)".

In the definition of GroupConcat, "unicode codepoint 32" is probably better described as "unicode codepoint U+0020".

GroupConcat(S, scalar) is defined in terms of fn:string-join, but that function is never defined or referenced again. The fn prefix is defined in section 1.2.1, but since this function is never discussed in the document, it should probably be hyperlinked to the xpath definition at <http://www.w3.org/2005/xpath-functions/#string-join>.


> > Material to move to formal definitions:
> >        10.2.3 Mapping from Abstract Syntax to Algebra

The example says the SUM expression "becomes Aggregation((?a), (?val), Sum, (), BGP(?a rdf:value ?val))." This form of Aggregation() uses one more argument than the definition in 10.2 (the GROUP BY variable). I would have expected the SUM expression to become

Aggregation((?val), Sum, (), Group((?a), BGP(?a rdf:value ?val)))


In the "Joining Aggregate Values" section, I have no idea what the introductory sentence is meant to convey. I may have misunderstood the definition of AggregateJoin, but it seems like it will produce a multiset of single-mapping sets, {agg_i -> range(A_i)}. I would have expected something like:

AggregateJoin(A) = { { (agg_i -> range(A_i)) | dom(A_i) = k } | k in set-union(dom(A)) }

The algorithmic sketch of using AggregateJoin in 17.2.3 might be more intuitive if there were more than one 'Let A_i' line (more than one aggregate operation in the query).


> > 11 Subqueries

There needs to be introductory text for subqueries.

"It is an error to reuse variable names both inside and outside a subquery when the variable is not projected from the subquery." -- I checked with Andy, and he said of this: "It's wrong and (for composition reasons) we decided otherwise."

> > 13 Basic Federated Query
> > See basic federated query doc

Much of the Federation document is written in a very casual and narrative fashion (very different than the Query doc; I suspect this will be very obvious if the federation text is just merged with the query document).

The document never discusses the "UNDEF" token that is introduced in the grammar.

"Solution Mapping (corresponds to the Concepts and Abstract Syntax term "RDF URI reference")" -- seems like a copy-paste typo.

"For instance, an edpoint" -- sp. "endpoint".

The examples are hard to follow because they are so domain-specific.

"The mechanics of executing a query over a graph" -- is this meant to be referring to "executing a query over a *named* graph"?

"Typically, a GRAPH constraint is matched against an RDF graph which is in the querying system, perhaps as the result of parsing the response to an HTTP GET on the named graph." -- This is needless detail. A GRAPH pattern is matched against named RDF graphs contained within the dataset being used for the query.

"GRAPH-constrained pattern" -- I don't know what this means.

"Note that WSDL defines the behavior with respect to constructing HTTP URLs from an endpoint and a set of query parameters, in particular appending '?' or '&' to an endpoint URL which may already have them." -- I'm not totally sure what this means, but I'd like to suggest that there should be a way to query over a custom dataset at the remote endpoint using the standard SPARQL Protocol conventions (SERVICE <http://example/endpoint?default-graph-uri=foo> {...}).

"application/sparql-results" -- should be "application/sparql-results+xml"

"For any other response, the query fails." -- Should this fail or just return an empty result set? I can think of arguments for both, but SERVICE blocks within OPTIONALS and UNIONS would be more useful if they didn't cause the entire query to fail.

"queryier" ??

In the example for section 3 BINDINGS, the ?id variable is bound to plain literals, but the example data from earlier in the document uses xsd:integer typed literals.

> > [FED]4.2 Definition of BINDINGS

"If a WhereClause has a BindingsClause" -- WhereClause doesn't 'have' a BindingsClause. The grammar associates these two through SelectQuery (with an intervening SolutionModifier).

Section 4.2 doesn't seem to follow the same conventions as the query doc . For example, "eval(BindingsSolutionSequence(P,V,St)) = Join(Rbc, P)" -- isn't P (a GGP) an AST, not an algebra, concept?.

> > [FED]5 SPARQL Federation Extensions Grammar

"""
It is a syntax error if to use a variable as the first argument to a ServiceGraphPattern if that variable is not bound (at least optionally) in the left hand side of a join with the ServiceGraphPattern on the right.
"""

"if to use" -- should be "to use"
This text should align with Axel's(?) proposed "potentially bound" concept, but in general it seems like it's trying to talk about a syntax error defined in terms of the algebra which is going to be confusing for people who otherwise don't need to ever think about the algebra. Also, join ordering doesn't have to use the lexical ordering, so "left hand side" here isn't particularly useful.

> >        15.1.2 SELECT expressions

"Variables can be also be used in expressions if they are introduced as to the earlier, syntactically, in the same SELECT clause" -- the wording here needs to be fixed. Talking about "New variables" being available might be more clear than just "Variables".


> >    16.6 Extensible Value Testing
> > 
> > New functions where there is a keyword:
> >        16.4.14 COALESCE
> >        16.4.15 IF

"rdfTerm coalesce(expression*)" -- this syntax doesn't seem to include the (required?) commas to separate expressions (c.f. 16.4.16 IN).

"The COALESCE function form" -- why not just "COALESCE function" without "form"? Similarly for IF.

"COALESCE(5, ?x) returns 2" -- shouldn't it return 5?

"the returns the value of expr2" -- should be "*then* returns the value of expr2".


> >        16.4.16 IN

"A list of zero terms on the right-hand side is legal." -- this is repeated twice in this section.

> >        16.4.17 NOT IN

"A list of zero terms on the right-hand side is legal." -- this is repeated twice in this section.

> >        16.4.18 IRI

"Passing any other RDF term is an error." -- this is misleading as both simple literals and iris are allowable, but this directly follows a sentence describing just the IRI(iri) case.

> >        16.4.19 URI
> >        16.4.20 BNODE

"constructs a blank node that is distinct for all blank nodes" -- should that be "distinct *from* all blank nodes"?

"distinct from all blank nodes created by calls to this constructor for other query solutions." -- "other query solutions" seems strange here since at the time BNODE() is evaluated it might not be operating with an actual query solution, but just a set of variable bindings (if invoked in a FILTER in the middle of a query). Is "query solution" still a valid term for these intermediate 'solutions'?

"and the same blank node for calls with the same simple literal within a FILTER for one solution mapping" -- is there a way to get the same bnode for the same simple literal *across* different solution mappings (across the entire query execution)?

> >        16.4.21 STRDT
> >        16.4.22 STRLANG

"""
STRLANG("chat", "en") "123"@en
"""

-- should be "chat"@en, not "123"@en.

> >        16.4.23 NOT EXISTS and EXISTS

"returns true/false depending on whether the pattern matches" -- Matches what? More detail would help explaining that it returns true if the evaluating the pattern (with variable substitution) results in any solutions.

"The NOT EXISTS form translates into fn:not(EXISTS{...})." -- is there a reason to prefer this to "!(EXISTS{...})"?

"Returns false if pattern pat matches the dataset."
"Returns true if pattern pat matches the dataset."
"Variables in the pattern pat"
-- all of these refer to 'pat', but the texts they are referring to use 'pattern'.

"Let μ a solution mapping" -- should be "Let μ *be* a solution mapping"
Received on Monday, 20 September 2010 22:18:31 UTC