rq25 (1.18) review (resend)

(The formatting got badly broken on the one Lee forwarded. This  
should be easier to read.)

Folks,

I don't think rq25 is ready to go to LC until the following issues  
are addressed satisfactorily:

1. all of the @@-marked bits are fixed; or, at least, all of the @@- 
marked bits related to the substantive material, i.e., the grammar  
and algebra.

2. section 12 is completed -- it's not even close as-is (sections  
12.2 and 12.3 are simply missing from the doc, even though they're  
listed in the TOC) -- even more worrisome, there is no connection  
between 12 and the rest of the doc. Fred Zemke said way back last  
summer that he didn't see a systematic connection between the grammar  
and the semantics, and, now, having read this carefully, I completely  
agree;

3. the status question about what is normative and what is  
informative is answered.

My primary concern is that the grammar and the algebra are normative,  
plus a subset of the functions
and operators stuff. All the rest should be informative. It's  
difficult to recommend making normative
material that describes itself as "informal". Specs are not informal  
documents, generally.

Ideally the doc would be rearranged such that a completed section 12  
and the grammar are at the front,
where normative material generally goes, and the rest of the informal  
tutorial material would follow
the normative sections. Even better: split that stuff into a separate  
doc, since there are very few,
if any, pointers or links from those sections to the grammar or  
algebra. With some work, the tutorial
material --sections 2 through 10, plus some bits of 11, I think --  
could actually become useful as a
guide to the language.

What follows are detailed comments on 1.18, except for 11, 12, and  
Appendix A, which I leave to
others. Some of these comments assume that the material in question  
is intended to be a formal
specification, so they may not be applicable to a tutorial or guide.

Cheers,
Kendall

Abstract

2nd paragraph: The first sentence is very awkward ("the query  
language part..."); way too colloquial and chatty for a spec, IMO.  
I'd strike "for easy access to data". I'd strike the entire next  
sentence starting "The SPARQL query language consists of..." in favor  
of an actual *definition* -- ah, which there is in 1
Introduction, making this redundant and unnecessary. Strike it.

In fact, I'd strike the entire paragraph. But "report forms" in the  
last sentence should be "result
forms", surely.

s/Status of this Document/Status of This Document/

1 Introduction

I would strike the first 3 paragraphs. This section should begin with  
"SPARQL consists of three
documents". Though, actually, that's weird, right? Why does the query  
language spec define SPARQL in
toto? The protocol document offers a definition of *the protocol*  
(see http://
www.w3.org/TR/rdf-sparql-protocol/#ap). Shouldn't the query language  
spec define *the query language*?
Is there a definition of the query language at all? (There is a  
definition of a "SPARQL Query String",
but that's different.)

Strike the odd adjective "companion" that's used in front of  
"protocol". Makes no sense.

1.1 Document Outline

I would strike this entirely, especially as the distinction  
"informal" v. "formal" is very
problematic. Are they synonyms of the standard spec terms  
"informative" and "normative"? If not, why
not?

What *is* normative in this document? I can't tell. That's a serious  
problem IMO. Given the
"informality" of nearly all of it -- a tone which I continue to  
object to -- how are we to resolve
conflicts between the "informal" and "formal" parts?

1.2.2 Data Descriptions

Strike "used to show each triple explicitly". A spec is *not* a meta- 
commentary upon itself. There
is, IMO, far too much of this kind of self-referential commentary. A  
specification *specifies*; it
does not discuss, converse, comment, or muse.

1.2.3 Results Descriptions

"used as a descriptive term" -- huh? Is the idea here to define  
"binding"? If so, I'd think the text
might read something like "A 'binding' is a pair (variable, RDF term)".

The last sentence of this section is grammatically incorrect (it's a  
run-on sentence), and I would
simply strike it. (All things equal, a shorter spec is a better spec.)

1.2.4 Terminology

What are "RDF URI References"? Is that a special term that we import  
from somewhere else? If so, can't
it be hyperlinked or defined? Generally it's accepted best practice  
in writing specs to define terms
either all at once in a glossary or at their first occurrence or both.

"The following terms are used from RDF Concepts..." -- this is an  
awkward wording. How about, instead,
"The following terms are defined in RDF Concepts..."?

That sentence is also a run-on by virtue of having no colon at the  
end of it...

IMO, we should not define terms used from another spec and then  
*rename* them in this spec. This is
just confusing. "IRI" -> "RDF URI reference"; and "datatype IRI" ->  
"datatype URI". If we're going to
do this -- and I'd prefer we didn't -- it should be more explicitly  
marked as such. Putting this into
two parenthetical phrases -- which suggests that that content is  
secondarily important -- is likely to
cause confusion.

"SPARQL implementations may issue warnings..." -- how? Which ones?  
There's a lot of talk about
warnings and errors, but no warnings or errors defined. Why not?

2 Making Simple Queries

Last sentence: what does "fulfill a pattern" mean? Is that different  
than or the same as "match" a
pattern?

2.2 Multiple Matches

"The results of a query are a sequence of solutions"; better: "The  
result of a query is a sequence of
solutions" or even just "a solution sequence" -- which gives you a  
nice, crisp term that could be
*defined* or linked to its formal definition in the semantics section.

Last sentence: "This is a basic graph pattern match..." is a run-on.

2.3 Matching RDF Literals

Last sentence: "This RDF data..." contains a hyphen to separate a  
range; but the standard orthography
for ranges is an en dash, available as "–" in HTML. This  
problem occurs throughout the doc:
http://en.wikipedia.org/wiki/Dash seems trustworthy on point.

2.3.1 Matching Language Tags

"Language tags in SPARQL are expressed the same way as in Turtle." --  
huh? Does that mean the same
grammar production is used in each language? Why is this even  
relevant here?

2.3.2 Matching Numeric Types

This first sentence should be struck. Either specify the integer  
datatype and then give an example;
this sentence does both, simultaneously, and confuses me on both  
counts. Also, it's "e.g.", not "eg".

2.3.3 Matching Arbitrary Datatypes

Last sentence is a run-on. And I don't understand it: "the literal is  
known to match" -- known to
whom? Huh?

2.4 Blank Node Labels in Query Results

This entire section should be redrafted. It's confusing, disjointed,  
and vague. What does "local to a
result set" *mean*? I have no idea. And "...should not expect blank  
node labels in a query to refer to
a particular blank node" -- What is a 'particular blank node' here?  
Are we entirely comfortable
talking about what computer processes should not "expect"? Surely  
this should just talk about
*matches* instead of all this "refer" and "reference" talk. Is that  
defined explicitly anywhere? If
so, can we get a link there? What about "co-occurrences of blank  
nodes" -- what does that mean?

Last sentence: "There need not be any relation..." -- I don't know  
what this means. There "need not
be", but there is anyway? There is, but only contingently? And what  
kind of "relation" is being ruled
out here? Not a lexical identity relation, surely.

3 RDF Term Constraints

"A constraint may lead to an error condition..." -- two issues here:  
first, this is another error
thingie that could happen but it's not specified, so it's not clear  
how to distinguish it from
something else. Second, is this the 'may' of specification or  
colloquial speech? Why doesn't rq25 use
terms like "may", "must", "must not", etc in their standard  
specification sense?

At the very least, if it's not going to use them in that way, it  
should *say* that it's not going to
use them in that way so that readers don't interpret them in that way  
by mistake.

But, really, shouldn't there be some really solid, domain-specific  
reason why we aren't *specifying*
using "may", "must", etc?

I'm all for flouting convention and throwing over best practices, but  
surely you need *good* reasons
to do so? What are our good reasons?

Last sentence: drop the parens.

3.2 Restricting Numeric Values

The second sentence is a total non sequitur as written.

3.3 Other Term Constraints

There are an alarmingly high number of "@@" in this doc; this section  
is but one example. Lots of @@
in grammar rules, it seems. This does not seem, to me, a sign of  
stability...

4.1.1 Syntax for IRIs

Most of the first paragraph is redundant. Why is this being repeated?  
Repetition like this is
analogous to cut-and-paste chunks of code; it's brittle and  
introduces errors.

"It is mapped to an IRI by concatenating *the* IRI..." -- add "the"  
to 2nd-to-last sentence.

Last sentence: "may be the empty string" -- huh?

4.1.2 Syntax for Literals

What's a "general syntax"? Is it different than a "syntax"?

4.1.3 Syntax for Query Variables

"...does not form part of the variable name" -- better: "...is not  
part of the variable name..."

4.1.4 Syntax for Blank Nodes

"The same blank node labels may not be used in two separate basic  
graph patterns." -- Surely, even in
informal, commentary style pseudo- spec'ese, this should be "must",  
not "may". And shouldn't it read
"two or more"? *May* one use the same blank node label in *three*  
separate basic graph patterns? In 5?

4.2.3 RDF Collections

I find this entire section very confusing. I can't tell what's  
"allocated": blank nodes or triple
patterns or both.

"These allocated blank nodes allocated do not occur elsewhere in the  
query." -- "allocated" should be
dropped, but I'm not sure which one... And "These allocated blank  
nodes..." is vague. Which ones?

"...is short for:" -- does that mean "is equivalent to" or something  
else?

4.3 Syntax for Constraints

Is this ready for LC?

5 Graph Patterns

"SPARQL is based around graph pattern matching." -- this is the 3rd  
or 4th similar sentence, spread
across the doc, and each one is *slightly* different. Is there some  
significance to the differences?
Is it really necessary to keep repeating the point? I think that  
confuses spec readers. It confuses
me, anyway.

5.1 Basic Graph Patterns

"SPARQL pattern matching is defined in terms matching basic graph  
patterns..." -- "of" missing? Also,
what kind of "SPARQL pattern matching"? Triple? Graph? And where is  
this defined precisely? Can we get
a link? A pointer or reference?

"Filters can be mixed into...but do not cause the end of a basic  
graph pattern." -- what is the "end"
of a basic graph pattern?

5.1.1 Blank Node Labels

s/"Labels"/"labels"/

This section refers to a "syntax error" -- which one? How's it  
spelled? Is this a generic syntax error
or a specific one? Confusing.

5.1.2 Extending Basic Graph Pattern Matching

"SPARQL is defined for matching RDF graphs with simple entailment."  
-- this is ambiguous: "simple
entailment" is part of the mechanism of matching? Or matching works  
in the presence of simple
entailment? Something else?

If this is all that's intended to be said in this section, strike it!

5.2 Group Graph Patterns

"In a SPARQL query string..." -- what's this? I think it's the first  
use of this wording. It's
different than other wordings, so I'm left to do the boring, tedious  
interpretive work of trying to
decide if it's a new construct or "informal" language. Can't we just  
stick to the same terms?

I see in the Grammar section there is a "SPARQL query string" and a  
"SPARQL Query String" -- are these
the same? If so, are they the same as the "SPARQL Query String" in 5.2?

Can't we get some hyperlinks or pointers?

Third sentence: run-on.

5.4 Examples

Another hyphen/dash problem.

"...the filter does not break the basic graph pattern into two  
pieces" -- huh? Pieces of what?

5.5 Scope of Filters

"A constraint, expressed by the syntax keyword FILTER, is a  
restriction on solution over the while
group in which the filter appears" -- I assume this is supposed to be  
"...a restriction on a solution
over the whole group...". "syntax keyword FILTER" is awkward. I don't  
remember "keyword" being defined
in this doc. I can't find a list of reserved words.

6 Including Optional Values

Strike the first sentence; we don't need commentary in the spec on  
why some feature of the language is
useful. That's the point of referring to UC&R at the outset.

Much of this paragraph is commentary and should be struck.

6.1 Optional Pattern Matching

Semicolon in sentence that starts "In an optional match,..." should  
be a comma.

"It is unbound" -- what is? Pronouns starting sentences in specs are  
almost always a "code smell".
They create the possibility of ambiguity and are best avoided. Which  
is easy to do if you've defined a
bunch of terms formally.

6.4 Nested Optional Graph Patterns

"The outer optional graph pattern must match for any nested optional  
pattern to be matched." -- which
one? The outer*most* must match? The nearest outer must match? All  
outers must match?

7 Matching Alternatives

"The UNION keyword is the syntax for pattern alternatives." -- I find  
all such constructions (and
there several of them) to be awkward. Better: "Pattern alternatives  
are created with UNION" or some
such. Saying "keyword" after every keyword is not as good as having a  
list of keywords. And the
typographic change indicates that it's keyword anyway, or would if  
there were a guide to typographic
conventions employed in the spec -- another best practice we seem to  
have jettisoned.

"If the application wishes to know how exactly the information was  
recorded..." -- this is awkward.
Better: "To determine exactly how the information was recorded..."

"The UNION operator..." -- it's an operator and a keyword? Typography  
suggest yes. I don't see UNION
in the list of operators... Confusing, especially since it's also  
called a "pattern" nearby. So that's
UNION keyword, pattern, and operator. None of which is defined  
precisely in context, nor linked to a
precise definition.

"Query results of GP1 UNION GP2..." -- what are GP1 and GP2?

8 RDF Dataset

I find much of this to be unnecessary (unwanted, really) commentary  
that could be struck, resulting in
a shorter, better spec. The first sentence, at least, is also redundant.

"Many RDF data stores hold multiple..." -- so what?

What status does the talk of arranging provenance information, say,  
in the default graph have? That's
*one* design pattern, but there are others. It sounds like it should  
have some normative weight, and
it certainly does if anything else in the document has any.

In the 2nd paragraph, "...each identified by IRI." -- The same IRI or  
different ones? This entire
sentence is confusing.

And these seem contradictory: First, "There may be no named graphs;  
there is always a default graph";
and, second, "A query does not need to involve the default graph..."

This is all confusing and needs to be reworked, IMO.

8.1 Examples of RDF Datasets

This section should be struck entirely.

8.2 Specifying RDF Datasets

"A query processor may use these IRIs in any way..." -- Which IRIs?

8.2.1 Specifying the Default Graph

"This does not put the graph in as a named graph; a query can do this  
by also specifying..." --
Multiple ambiguities: What does not put the graph in? What does "put  
the graph in" mean? Put it into
the dataset? A query can do *what* by also specifying?

"If a query provides more than one FROM clause..." -- sentence is  
awkward.

8.2.2 Specifying Named Graphs

"Each IRI is used to provide one..." -- provide? What does that mean  
here?

Oh, and we get another language-sounding construct hereabouts... the  
"clause"... How does that related
to an operator, keyword, or pattern? Does it not relate? I'm confused.

8.3 Querying the Dataset

"This is by either using an IRI..." -- huh?

8.3.1 Accessing Graph Names

"The query below matches the graph pattern on each of the named  
graphs in the dataset..." -- I don't
know what "the graph pattern *on each of the named graphs*" means.  
Sentence is confusing.

8.3.3 Restricting possible Graph IRIs -- s/Possible/possible/

"A variable used in the GRAPH clause may also be used in another  
GRAPH clause..." -- and? Does this
mean they're the same variable?

"This can be used to find information..." -- Antecedent of this is...?

8.3.4 Named and Default Graphs

"The default graph is being used to record the provenance  
information..." -- Is this normative?
Informative? Formal? Informal? Since it's the only example of graph  
relations used, it will seem
endorsed or a best practice. I don't think that's appropriate here.

9 Solution Sequence and Modifiers

"Modifiers are applied in the order given by the list." -- What list?  
Does this mean modifiers *must*
be applied in some order? As written, I don't think it says that.

9.1 ORDER BY

The way ORDER BY is described, it sounds like some kind of function.  
The syntax of ASC and DESC
*looks* like function syntax. Why? Isn't the relationship between  
ORDER BY and some variable (?name)
the same as the relation between DESC or ASC and some variable? I  
expected ORDER_BY(?name) DESC(?emp);
or ORDER BY ?name DESC ?emp; but not ORDER BY ?name DESC(?emp).

9.3 DISTINCT

"The solution sequence can be modified...in the sequence is unique"  
-- by which standard of identity
is this to be determined?

9.4 OFFSET

"OFFSET causes the solutions generated to start after the specified  
number of solutions" -- this is
really awkwardly worded.

In the next sentence, I don't understand what work "initially" is  
supposed to do.

I think there is a better word than "predictable"; perhaps "stable"  
or "deterministic" or some such.
Better to just say that LIMIT/OFFSET should be used with ORDER BY to  
be useful.

9.5 LIMIT

Another variant for describing syntax: "The LIMIT form..." Is "form"  
a special term here?

10 Query Result Forms

This is confusing. We originally called these "query forms" and we  
have "result forms". "Query result
forms" is just confusing. I suggest we revert to "Query Forms".

Also, for my $$, this is just in the wrong place and contributes to  
the sense that there really *is*
no real organizational scheme to the "informal" presentation of the  
language.

The query forms section should be at or near the *beginning* of the  
document's "informal" section.

FWIW, I've read several blog comments recently to the effect of "I  
didn't know CONSTRUCT was in
SPARQL" -- that it's tacked on at the end can't help by contribute to  
that fact.

10.1 Selecting Variables

"Results can be thought of as a table or result set..." -- First,  
this is just really awkward. They
can be thought of as lumps of blue cheese floating in the ether. What  
matters is what they *are*.
Second, this is redundant; we've already had a description of the  
tabular presentation style of result
sets in this document.

10.2 Constructing an Output Graph

"...substituting for the variables into the graph template" -- s/in/  
into/

Next paragraph: drop the parens; replace the "(" with a comma.

Last sentence in that paragraph is a run-on.

10.2.2 Accessing Graphs in the RDF Dataset

"Using CONSTRUCT it is possible" -- add a comma after CONSTRUCT.

And drop the parens, replace w/ commas.

10.2.3 Solution Modifiers and CONSTRUCT

"2" should be spelled out, "two".

10.3 Descriptions of Resources (Non-normative)

What's a "Non-normative Resource"? Or, rather, what are "Non-  
normative Descriptions of Resources"?

This is ambiguous in at least two dimensions: is this supposed to  
indicate that this *section* is not
normative? If not, which of the two aforementioned readings is intended?

If this section is "non-normative", that means that the entire  
remainder of the document *is*
normative, including all the commentary and design discussion. Or  
something...

Oh, and which part is meant as "non-normative"? 10.3? 10.3.*

This really needs to be sorted out *before* LC.

"Current conventions for DESCRIBE return an RDF graph without any  
specified constraints" -- what does
that mean? It's completely opaque IMO.

"As with any query, a service may refuse to serve a DESCRIBE  
query"... What's a service? If this is
meant to allude to some protocol thing, why not have a link or  
pointer to that thing? I guess the
protocol doc is a "companion", but one that this doc can't talk  
about? :>

What's a "knowledge base"? What's a "target knowledge base"?

What's a "SPARQL query processor"? Is that different than the "service"?

10.3.3 Descriptions of Resources

The commentary and design discussions should be dropped.

How about we just say "DESCRIBE is intentionally unspecified" and  
leave it at that?

Also, I object to CBD being referenced under the rubric of "other  
possible mechanisms"... Either list
others or drop this one. CBD has no special status or interest that  
I'm aware of. And it's been
criticized, so it's not "the thing everyone does".

10.4 Asking "yes or no" questions

This section title is awkward. It's not capitalized like any other  
section head, and it's not clear
what a "yes or no" question is...

10.1, 10.2, 10.3, and 10.4 should be titled SELECT, CONSTRUCT,  
DESCRIBE, and ASK respectively.

[I'm skipping from 11 to...]

B. Conformance

I don't think this section is sufficient. There's a lot of talk in  
the doc about error conditions,
warnings, and lots of mays and musts -- none of that is covered by  
the grammar or result forms
conformance stuff, nor is it covered in the protocol spec. I think  
this is a problem and will hurt
interoperability.

"See those specifications for their conformance criteria" -- how  
about a link?
http://www.w3.org/TR/rdf-sparql-protocol/#conformance

Finally, the sentence starting "Note that the SPARQL protocol  
describes" should be struck. Any such
commentary or note doesn't belong in the query language spec at all,  
IMO, and certainly not in the
section on conformance. It sticks out like a sore thumb.

If there is interest in a statement like this in the protocol spec,  
that should be handled in the
normal process for the WG. In fact, #4 in the protocol conformance  
section already says that, so this
statement is also redundant and further muddies the normative status  
of the query spec...

D. Collected Formal Definitions

"The collected formal definitions are collected..."

E. Internet Media Type, File Extension and Macintosh File Type
(Normative)

So *this* is the only normative part of the spec? Oh, except for the  
Normative References part of F.
References. That's...odd.




Cheers,
Kendall

Received on Monday, 26 February 2007 23:21:34 UTC