Kendall's review of rq25

[Kendall appears to be having difficulty posting to the list, so he's 
asked me to forward along his review to the list. --Lee ]

-- Kendall's review follows:

Folks,

I don't think rq25 is ready to go to LC until the following issues 
are addressed satisfactorily:

1. all of the @@-marked bits are fixed; or, at least, all of the @@- 
marked bits related to the substantive material, i.e., the grammar 
and algebra.

2. section 12 is completed -- it's not even close as-is (sections 
12.2 and 12.3 are simply missing from the doc, even though they're 
listed in the TOC) -- even more worrisome, there is no connection 
between 12 and the rest of the doc. Fred Zemke said way back last 
summer that he didn't see a systematic connection between the grammar 
and the semantics, and, now, having read this carefully, I completely 
agree;

3. the status question about what is normative and what is 
informative is answered.

My primary concern is that the grammar and the algebra are normative, 
plus a subset of the functions
and operators stuff. All the rest should be informative. It's 
difficult to recommend making normative
material that describes itself as "informal". Specs are not informal 
documents, generally.

Ideally the doc would be rearranged such that a completed section 12 
and the grammar are at the front,
where normative material generally goes, and the rest of the informal 
tutorial material would follow
the normative sections. Even better: split that stuff into a separate 
doc, since there are very few,
if any, pointers or links from those sections to the grammar or 
algebra. With some work, the tutorial
material --sections 2 through 10, plus some bits of 11, I think -- 
could actually become useful as a
guide to the language.

What follows are detailed comments on 1.18, except for 11, 12, and 
Appendix A, which I leave to
others. Some of these comments assume that the material in question 
is intended to be a formal
specification, so they may not be applicable to a tutorial or guide.

Cheers,
Kendall

Abstract

2nd paragraph: The first sentence is very awkward ("the query 
language part..."); way too colloquial and chatty for a spec, IMO. 
I'd strike "for easy access to data". I'd strike the entire next 
sentence starting "The SPARQL query language consists of..." in favor 
of an actual *definition* -- ah, which there is in 1
Introduction, making this redundant and unnecessary. Strike it.

In fact, I'd strike the entire paragraph. But "report forms" in the 
last sentence should be "result
forms", surely.

s/Status of this Document/Status of This Document/

1 Introduction

I would strike the first 3 paragraphs. This section should begin with 
"SPARQL consists of three
documents". Though, actually, that's weird, right? Why does the query 
language spec define SPARQL in
toto? The protocol document offers a definition of *the protocol* 
(see http://
www.w3.org/TR/rdf-sparql-protocol/#ap). Shouldn't the query language 
spec define *the query language*?
Is there a definition of the query language at all? (There is a 
definition of a "SPARQL Query String",
but that's different.)

Strike the odd adjective "companion" that's used in front of 
"protocol". Makes no sense.

1.1 Document Outline

I would strike this entirely, especially as the distinction 
"informal" v. "formal" is very
problematic. Are they synonyms of the standard spec terms 
"informative" and "normative"? If not, why
not?

What *is* normative in this document? I can't tell. That's a serious 
problem IMO. Given the
"informality" of nearly all of it -- a tone which I continue to 
object to -- how are we to resolve
conflicts between the "informal" and "formal" parts?

1.2.2 Data Descriptions

Strike "used to show each triple explicitly". A spec is *not* a meta- 
commentary upon itself. There
is, IMO, far too much of this kind of self-referential commentary. A 
specification *specifies*; it
does not discuss, converse, comment, or muse.

1.2.3 Results Descriptions

"used as a descriptive term" -- huh? Is the idea here to define 
"binding"? If so, I'd think the text
might read something like "A 'binding' is a pair (variable, RDF term)".

The last sentence of this section is grammatically incorrect (it's a 
run-on sentence), and I would
simply strike it. (All things equal, a shorter spec is a better spec.)

1.2.4 Terminology

What are "RDF URI References"? Is that a special term that we import 
from somewhere else? If so, can't
it be hyperlinked or defined? Generally it's accepted best practice 
in writing specs to define terms
either all at once in a glossary or at their first occurrence or both.

"The following terms are used from RDF Concepts..." -- this is an 
awkward wording. How about, instead,
"The following terms are defined in RDF Concepts..."?

That sentence is also a run-on by virtue of having no colon at the 
end of it...

IMO, we should not define terms used from another spec and then 
*rename* them in this spec. This is
just confusing. "IRI" -> "RDF URI reference"; and "datatype IRI" -> 
"datatype URI". If we're going to
do this -- and I'd prefer we didn't -- it should be more explicitly 
marked as such. Putting this into
two parenthetical phrases -- which suggests that that content is 
secondarily important -- is likely to
cause confusion.

"SPARQL implementations may issue warnings..." -- how? Which ones? 
There's a lot of talk about
warnings and errors, but no warnings or errors defined. Why not?

2 Making Simple Queries

Last sentence: what does "fulfill a pattern" mean? Is that different 
than or the same as "match" a
pattern?

2.2 Multiple Matches

"The results of a query are a sequence of solutions"; better: "The 
result of a query is a sequence of
solutions" or even just "a solution sequence" -- which gives you a 
nice, crisp term that could be
*defined* or linked to its formal definition in the semantics section.

Last sentence: "This is a basic graph pattern match..." is a run-on.

2.3 Matching RDF Literals

Last sentence: "This RDF data..." contains a hyphen to separate a 
range; but the standard orthography
for ranges is an en dash, available as "–" in HTML. This 
problem occurs throughout the doc:
http://en.wikipedia.org/wiki/Dash seems trustworthy on point.

2.3.1 Matching Language Tags

"Language tags in SPARQL are expressed the same way as in Turtle." -- 
huh? Does that mean the same
grammar production is used in each language? Why is this even 
relevant here?

2.3.2 Matching Numeric Types

This first sentence should be struck. Either specify the integer 
datatype and then give an example;
this sentence does both, simultaneously, and confuses me on both 
counts. Also, it's "e.g.", not "eg".

2.3.3 Matching Arbitrary Datatypes

Last sentence is a run-on. And I don't understand it: "the literal is 
known to match" -- known to
whom? Huh?

2.4 Blank Node Labels in Query Results

This entire section should be redrafted. It's confusing, disjointed, 
and vague. What does "local to a
result set" *mean*? I have no idea. And "...should not expect blank 
node labels in a query to refer to
a particular blank node" -- What is a 'particular blank node' here? 
Are we entirely comfortable
talking about what computer processes should not "expect"? Surely 
this should just talk about
*matches* instead of all this "refer" and "reference" talk. Is that 
defined explicitly anywhere? If
so, can we get a link there? What about "co-occurrences of blank 
nodes" -- what does that mean?

Last sentence: "There need not be any relation..." -- I don't know 
what this means. There "need not
be", but there is anyway? There is, but only contingently? And what 
kind of "relation" is being ruled
out here? Not a lexical identity relation, surely.

3 RDF Term Constraints

"A constraint may lead to an error condition..." -- two issues here: 
first, this is another error
thingie that could happen but it's not specified, so it's not clear 
how to distinguish it from
something else. Second, is this the 'may' of specification or 
colloquial speech? Why doesn't rq25 use
terms like "may", "must", "must not", etc in their standard 
specification sense?

At the very least, if it's not going to use them in that way, it 
should *say* that it's not going to
use them in that way so that readers don't interpret them in that way 
by mistake.

But, really, shouldn't there be some really solid, domain-specific 
reason why we aren't *specifying*
using "may", "must", etc?

I'm all for flouting convention and throwing over best practices, but 
surely you need *good* reasons
to do so? What are our good reasons?

Last sentence: drop the parens.

3.2 Restricting Numeric Values

The second sentence is a total non sequitur as written.

3.3 Other Term Constraints

There are an alarmingly high number of "@@" in this doc; this section 
is but one example. Lots of @@
in grammar rules, it seems. This does not seem, to me, a sign of 
stability...

4.1.1 Syntax for IRIs

Most of the first paragraph is redundant. Why is this being repeated? 
Repetition like this is
analogous to cut-and-paste chunks of code; it's brittle and 
introduces errors.

"It is mapped to an IRI by concatenating *the* IRI..." -- add "the" 
to 2nd-to-last sentence.

Last sentence: "may be the empty string" -- huh?

4.1.2 Syntax for Literals

What's a "general syntax"? Is it different than a "syntax"?

4.1.3 Syntax for Query Variables

"...does not form part of the variable name" -- better: "...is not 
part of the variable name..."

4.1.4 Syntax for Blank Nodes

"The same blank node labels may not be used in two separate basic 
graph patterns." -- Surely, even in
informal, commentary style pseudo- spec'ese, this should be "must", 
not "may". And shouldn't it read
"two or more"? *May* one use the same blank node label in *three* 
separate basic graph patterns? In 5?

4.2.3 RDF Collections

I find this entire section very confusing. I can't tell what's 
"allocated": blank nodes or triple
patterns or both.

"These allocated blank nodes allocated do not occur elsewhere in the 
query." -- "allocated" should be
dropped, but I'm not sure which one... And "These allocated blank 
nodes..." is vague. Which ones?

"...is short for:" -- does that mean "is equivalent to" or something 
else?

4.3 Syntax for Constraints

Is this ready for LC?

5 Graph Patterns

"SPARQL is based around graph pattern matching." -- this is the 3rd 
or 4th similar sentence, spread
across the doc, and each one is *slightly* different. Is there some 
significance to the differences?
Is it really necessary to keep repeating the point? I think that 
confuses spec readers. It confuses
me, anyway.

5.1 Basic Graph Patterns

"SPARQL pattern matching is defined in terms matching basic graph 
patterns..." -- "of" missing? Also,
what kind of "SPARQL pattern matching"? Triple? Graph? And where is 
this defined precisely? Can we get
a link? A pointer or reference?

"Filters can be mixed into...but do not cause the end of a basic 
graph pattern." -- what is the "end"
of a basic graph pattern?

5.1.1 Blank Node Labels

s/"Labels"/"labels"/

This section refers to a "syntax error" -- which one? How's it 
spelled? Is this a generic syntax error
or a specific one? Confusing.

5.1.2 Extending Basic Graph Pattern Matching

"SPARQL is defined for matching RDF graphs with simple entailment." 
-- this is ambiguous: "simple
entailment" is part of the mechanism of matching? Or matching works 
in the presence of simple
entailment? Something else?

If this is all that's intended to be said in this section, strike it!

5.2 Group Graph Patterns

"In a SPARQL query string..." -- what's this? I think it's the first 
use of this wording. It's
different than other wordings, so I'm left to do the boring, tedious 
interpretive work of trying to
decide if it's a new construct or "informal" language. Can't we just 
stick to the same terms?

I see in the Grammar section there is a "SPARQL query string" and a 
"SPARQL Query String" -- are these
the same? If so, are they the same as the "SPARQL Query String" in 5.2?

Can't we get some hyperlinks or pointers?

Third sentence: run-on.

5.4 Examples

Another hyphen/dash problem.

"...the filter does not break the basic graph pattern into two 
pieces" -- huh? Pieces of what?

5.5 Scope of Filters

"A constraint, expressed by the syntax keyword FILTER, is a 
restriction on solution over the while
group in which the filter appears" -- I assume this is supposed to be 
"...a restriction on a solution
over the whole group...". "syntax keyword FILTER" is awkward. I don't 
remember "keyword" being defined
in this doc. I can't find a list of reserved words.

6 Including Optional Values

Strike the first sentence; we don't need commentary in the spec on 
why some feature of the language is
useful. That's the point of referring to UC&R at the outset.

Much of this paragraph is commentary and should be struck.

6.1 Optional Pattern Matching

Semicolon in sentence that starts "In an optional match,..." should 
be a comma.

"It is unbound" -- what is? Pronouns starting sentences in specs are 
almost always a "code smell".
They create the possibility of ambiguity and are best avoided. Which 
is easy to do if you've defined a
bunch of terms formally.

6.4 Nested Optional Graph Patterns

"The outer optional graph pattern must match for any nested optional 
pattern to be matched." -- which
one? The outer*most* must match? The nearest outer must match? All 
outers must match?

7 Matching Alternatives

"The UNION keyword is the syntax for pattern alternatives." -- I find 
all such constructions (and
there several of them) to be awkward. Better: "Pattern alternatives 
are created with UNION" or some
such. Saying "keyword" after every keyword is not as good as having a 
list of keywords. And the
typographic change indicates that it's keyword anyway, or would if 
there were a guide to typographic
conventions employed in the spec -- another best practice we seem to 
have jettisoned.

"If the application wishes to know how exactly the information was 
recorded..." -- this is awkward.
Better: "To determine exactly how the information was recorded..."

"The UNION operator..." -- it's an operator and a keyword? Typography 
suggest yes. I don't see UNION
in the list of operators... Confusing, especially since it's also 
called a "pattern" nearby. So that's
UNION keyword, pattern, and operator. None of which is defined 
precisely in context, nor linked to a
precise definition.

"Query results of GP1 UNION GP2..." -- what are GP1 and GP2?

8 RDF Dataset

I find much of this to be unnecessary (unwanted, really) commentary 
that could be struck, resulting in
a shorter, better spec. The first sentence, at least, is also redundant.

"Many RDF data stores hold multiple..." -- so what?

What status does the talk of arranging provenance information, say, 
in the default graph have? That's
*one* design pattern, but there are others. It sounds like it should 
have some normative weight, and
it certainly does if anything else in the document has any.

In the 2nd paragraph, "...each identified by IRI." -- The same IRI or 
different ones? This entire
sentence is confusing.

And these seem contradictory: First, "There may be no named graphs; 
there is always a default graph";
and, second, "A query does not need to involve the default graph..."

This is all confusing and needs to be reworked, IMO.

8.1 Examples of RDF Datasets

This section should be struck entirely.

8.2 Specifying RDF Datasets

"A query processor may use these IRIs in any way..." -- Which IRIs?

8.2.1 Specifying the Default Graph

"This does not put the graph in as a named graph; a query can do this 
by also specifying..." --
Multiple ambiguities: What does not put the graph in? What does "put 
the graph in" mean? Put it into
the dataset? A query can do *what* by also specifying?

"If a query provides more than one FROM clause..." -- sentence is 
awkward.

8.2.2 Specifying Named Graphs

"Each IRI is used to provide one..." -- provide? What does that mean 
here?

Oh, and we get another language-sounding construct hereabouts... the 
"clause"... How does that related
to an operator, keyword, or pattern? Does it not relate? I'm confused.

8.3 Querying the Dataset

"This is by either using an IRI..." -- huh?

8.3.1 Accessing Graph Names

"The query below matches the graph pattern on each of the named 
graphs in the dataset..." -- I don't
know what "the graph pattern *on each of the named graphs*" means. 
Sentence is confusing.

8.3.3 Restricting possible Graph IRIs -- s/Possible/possible/

"A variable used in the GRAPH clause may also be used in another 
GRAPH clause..." -- and? Does this
mean they're the same variable?

"This can be used to find information..." -- Antecedent of this is...?

8.3.4 Named and Default Graphs

"The default graph is being used to record the provenance 
information..." -- Is this normative?
Informative? Formal? Informal? Since it's the only example of graph 
relations used, it will seem
endorsed or a best practice. I don't think that's appropriate here.

9 Solution Sequence and Modifiers

"Modifiers are applied in the order given by the list." -- What list? 
Does this mean modifiers *must*
be applied in some order? As written, I don't think it says that.

9.1 ORDER BY

The way ORDER BY is described, it sounds like some kind of function. 
The syntax of ASC and DESC
*looks* like function syntax. Why? Isn't the relationship between 
ORDER BY and some variable (?name)
the same as the relation between DESC or ASC and some variable? I 
expected ORDER_BY(?name) DESC(?emp);
or ORDER BY ?name DESC ?emp; but not ORDER BY ?name DESC(?emp).

9.3 DISTINCT

"The solution sequence can be modified...in the sequence is unique" 
-- by which standard of identity
is this to be determined?

9.4 OFFSET

"OFFSET causes the solutions generated to start after the specified 
number of solutions" -- this is
really awkwardly worded.

In the next sentence, I don't understand what work "initially" is 
supposed to do.

I think there is a better word than "predictable"; perhaps "stable" 
or "deterministic" or some such.
Better to just say that LIMIT/OFFSET should be used with ORDER BY to 
be useful.

9.5 LIMIT

Another variant for describing syntax: "The LIMIT form..." Is "form" 
a special term here?

10 Query Result Forms

This is confusing. We originally called these "query forms" and we 
have "result forms". "Query result
forms" is just confusing. I suggest we revert to "Query Forms".

Also, for my $$, this is just in the wrong place and contributes to 
the sense that there really *is*
no real organizational scheme to the "informal" presentation of the 
language.

The query forms section should be at or near the *beginning* of the 
document's "informal" section.

FWIW, I've read several blog comments recently to the effect of "I 
didn't know CONSTRUCT was in
SPARQL" -- that it's tacked on at the end can't help by contribute to 
that fact.

10.1 Selecting Variables

"Results can be thought of as a table or result set..." -- First, 
this is just really awkward. They
can be thought of as lumps of blue cheese floating in the ether. What 
matters is what they *are*.
Second, this is redundant; we've already had a description of the 
tabular presentation style of result
sets in this document.

10.2 Constructing an Output Graph

"...substituting for the variables into the graph template" -- s/in/ 
into/

Next paragraph: drop the parens; replace the "(" with a comma.

Last sentence in that paragraph is a run-on.

10.2.2 Accessing Graphs in the RDF Dataset

"Using CONSTRUCT it is possible" -- add a comma after CONSTRUCT.

And drop the parens, replace w/ commas.

10.2.3 Solution Modifiers and CONSTRUCT

"2" should be spelled out, "two".

10.3 Descriptions of Resources (Non-normative)

What's a "Non-normative Resource"? Or, rather, what are "Non- 
normative Descriptions of Resources"?

This is ambiguous in at least two dimensions: is this supposed to 
indicate that this *section* is not
normative? If not, which of the two aforementioned readings is intended?

If this section is "non-normative", that means that the entire 
remainder of the document *is*
normative, including all the commentary and design discussion. Or 
something...

Oh, and which part is meant as "non-normative"? 10.3? 10.3.*

This really needs to be sorted out *before* LC.

"Current conventions for DESCRIBE return an RDF graph without any 
specified constraints" -- what does
that mean? It's completely opaque IMO.

"As with any query, a service may refuse to serve a DESCRIBE 
query"... What's a service? If this is
meant to allude to some protocol thing, why not have a link or 
pointer to that thing? I guess the
protocol doc is a "companion", but one that this doc can't talk 
about? :>

What's a "knowledge base"? What's a "target knowledge base"?

What's a "SPARQL query processor"? Is that different than the "service"?

10.3.3 Descriptions of Resources

The commentary and design discussions should be dropped.

How about we just say "DESCRIBE is intentionally unspecified" and 
leave it at that?

Also, I object to CBD being referenced under the rubric of "other 
possible mechanisms"... Either list
others or drop this one. CBD has no special status or interest that 
I'm aware of. And it's been
criticized, so it's not "the thing everyone does".

10.4 Asking "yes or no" questions

This section title is awkward. It's not capitalized like any other 
section head, and it's not clear
what a "yes or no" question is...

10.1, 10.2, 10.3, and 10.4 should be titled SELECT, CONSTRUCT, 
DESCRIBE, and ASK respectively.

[I'm skipping from 11 to...]

B. Conformance

I don't think this section is sufficient. There's a lot of talk in 
the doc about error conditions,
warnings, and lots of mays and musts -- none of that is covered by 
the grammar or result forms
conformance stuff, nor is it covered in the protocol spec. I think 
this is a problem and will hurt
interoperability.

"See those specifications for their conformance criteria" -- how 
about a link?
http://www.w3.org/TR/rdf-sparql-protocol/#conformance

Finally, the sentence starting "Note that the SPARQL protocol 
describes" should be struck. Any such
commentary or note doesn't belong in the query language spec at all, 
IMO, and certainly not in the
section on conformance. It sticks out like a sore thumb.

If there is interest in a statement like this in the protocol spec, 
that should be handled in the
normal process for the WG. In fact, #4 in the protocol conformance 
section already says that, so this
statement is also redundant and further muddies the normative status 
of the query spec...

D. Collected Formal Definitions

"The collected formal definitions are collected..."

E. Internet Media Type, File Extension and Macintosh File Type
(Normative)

So *this* is the only normative part of the spec? Oh, except for the 
Normative References part of F.
References. That's...odd.

Received on Monday, 26 February 2007 16:35:06 UTC