- From: Kendall Clark <kendall@monkeyfist.com>
- Date: Mon, 26 Feb 2007 18:21:19 -0500
- To: dawg mailing list <public-rdf-dawg@w3.org>
(The formatting got badly broken on the one Lee forwarded. This should be easier to read.) Folks, I don't think rq25 is ready to go to LC until the following issues are addressed satisfactorily: 1. all of the @@-marked bits are fixed; or, at least, all of the @@- marked bits related to the substantive material, i.e., the grammar and algebra. 2. section 12 is completed -- it's not even close as-is (sections 12.2 and 12.3 are simply missing from the doc, even though they're listed in the TOC) -- even more worrisome, there is no connection between 12 and the rest of the doc. Fred Zemke said way back last summer that he didn't see a systematic connection between the grammar and the semantics, and, now, having read this carefully, I completely agree; 3. the status question about what is normative and what is informative is answered. My primary concern is that the grammar and the algebra are normative, plus a subset of the functions and operators stuff. All the rest should be informative. It's difficult to recommend making normative material that describes itself as "informal". Specs are not informal documents, generally. Ideally the doc would be rearranged such that a completed section 12 and the grammar are at the front, where normative material generally goes, and the rest of the informal tutorial material would follow the normative sections. Even better: split that stuff into a separate doc, since there are very few, if any, pointers or links from those sections to the grammar or algebra. With some work, the tutorial material --sections 2 through 10, plus some bits of 11, I think -- could actually become useful as a guide to the language. What follows are detailed comments on 1.18, except for 11, 12, and Appendix A, which I leave to others. Some of these comments assume that the material in question is intended to be a formal specification, so they may not be applicable to a tutorial or guide. Cheers, Kendall Abstract 2nd paragraph: The first sentence is very awkward ("the query language part..."); way too colloquial and chatty for a spec, IMO. I'd strike "for easy access to data". I'd strike the entire next sentence starting "The SPARQL query language consists of..." in favor of an actual *definition* -- ah, which there is in 1 Introduction, making this redundant and unnecessary. Strike it. In fact, I'd strike the entire paragraph. But "report forms" in the last sentence should be "result forms", surely. s/Status of this Document/Status of This Document/ 1 Introduction I would strike the first 3 paragraphs. This section should begin with "SPARQL consists of three documents". Though, actually, that's weird, right? Why does the query language spec define SPARQL in toto? The protocol document offers a definition of *the protocol* (see http:// www.w3.org/TR/rdf-sparql-protocol/#ap). Shouldn't the query language spec define *the query language*? Is there a definition of the query language at all? (There is a definition of a "SPARQL Query String", but that's different.) Strike the odd adjective "companion" that's used in front of "protocol". Makes no sense. 1.1 Document Outline I would strike this entirely, especially as the distinction "informal" v. "formal" is very problematic. Are they synonyms of the standard spec terms "informative" and "normative"? If not, why not? What *is* normative in this document? I can't tell. That's a serious problem IMO. Given the "informality" of nearly all of it -- a tone which I continue to object to -- how are we to resolve conflicts between the "informal" and "formal" parts? 1.2.2 Data Descriptions Strike "used to show each triple explicitly". A spec is *not* a meta- commentary upon itself. There is, IMO, far too much of this kind of self-referential commentary. A specification *specifies*; it does not discuss, converse, comment, or muse. 1.2.3 Results Descriptions "used as a descriptive term" -- huh? Is the idea here to define "binding"? If so, I'd think the text might read something like "A 'binding' is a pair (variable, RDF term)". The last sentence of this section is grammatically incorrect (it's a run-on sentence), and I would simply strike it. (All things equal, a shorter spec is a better spec.) 1.2.4 Terminology What are "RDF URI References"? Is that a special term that we import from somewhere else? If so, can't it be hyperlinked or defined? Generally it's accepted best practice in writing specs to define terms either all at once in a glossary or at their first occurrence or both. "The following terms are used from RDF Concepts..." -- this is an awkward wording. How about, instead, "The following terms are defined in RDF Concepts..."? That sentence is also a run-on by virtue of having no colon at the end of it... IMO, we should not define terms used from another spec and then *rename* them in this spec. This is just confusing. "IRI" -> "RDF URI reference"; and "datatype IRI" -> "datatype URI". If we're going to do this -- and I'd prefer we didn't -- it should be more explicitly marked as such. Putting this into two parenthetical phrases -- which suggests that that content is secondarily important -- is likely to cause confusion. "SPARQL implementations may issue warnings..." -- how? Which ones? There's a lot of talk about warnings and errors, but no warnings or errors defined. Why not? 2 Making Simple Queries Last sentence: what does "fulfill a pattern" mean? Is that different than or the same as "match" a pattern? 2.2 Multiple Matches "The results of a query are a sequence of solutions"; better: "The result of a query is a sequence of solutions" or even just "a solution sequence" -- which gives you a nice, crisp term that could be *defined* or linked to its formal definition in the semantics section. Last sentence: "This is a basic graph pattern match..." is a run-on. 2.3 Matching RDF Literals Last sentence: "This RDF data..." contains a hyphen to separate a range; but the standard orthography for ranges is an en dash, available as "–" in HTML. This problem occurs throughout the doc: http://en.wikipedia.org/wiki/Dash seems trustworthy on point. 2.3.1 Matching Language Tags "Language tags in SPARQL are expressed the same way as in Turtle." -- huh? Does that mean the same grammar production is used in each language? Why is this even relevant here? 2.3.2 Matching Numeric Types This first sentence should be struck. Either specify the integer datatype and then give an example; this sentence does both, simultaneously, and confuses me on both counts. Also, it's "e.g.", not "eg". 2.3.3 Matching Arbitrary Datatypes Last sentence is a run-on. And I don't understand it: "the literal is known to match" -- known to whom? Huh? 2.4 Blank Node Labels in Query Results This entire section should be redrafted. It's confusing, disjointed, and vague. What does "local to a result set" *mean*? I have no idea. And "...should not expect blank node labels in a query to refer to a particular blank node" -- What is a 'particular blank node' here? Are we entirely comfortable talking about what computer processes should not "expect"? Surely this should just talk about *matches* instead of all this "refer" and "reference" talk. Is that defined explicitly anywhere? If so, can we get a link there? What about "co-occurrences of blank nodes" -- what does that mean? Last sentence: "There need not be any relation..." -- I don't know what this means. There "need not be", but there is anyway? There is, but only contingently? And what kind of "relation" is being ruled out here? Not a lexical identity relation, surely. 3 RDF Term Constraints "A constraint may lead to an error condition..." -- two issues here: first, this is another error thingie that could happen but it's not specified, so it's not clear how to distinguish it from something else. Second, is this the 'may' of specification or colloquial speech? Why doesn't rq25 use terms like "may", "must", "must not", etc in their standard specification sense? At the very least, if it's not going to use them in that way, it should *say* that it's not going to use them in that way so that readers don't interpret them in that way by mistake. But, really, shouldn't there be some really solid, domain-specific reason why we aren't *specifying* using "may", "must", etc? I'm all for flouting convention and throwing over best practices, but surely you need *good* reasons to do so? What are our good reasons? Last sentence: drop the parens. 3.2 Restricting Numeric Values The second sentence is a total non sequitur as written. 3.3 Other Term Constraints There are an alarmingly high number of "@@" in this doc; this section is but one example. Lots of @@ in grammar rules, it seems. This does not seem, to me, a sign of stability... 4.1.1 Syntax for IRIs Most of the first paragraph is redundant. Why is this being repeated? Repetition like this is analogous to cut-and-paste chunks of code; it's brittle and introduces errors. "It is mapped to an IRI by concatenating *the* IRI..." -- add "the" to 2nd-to-last sentence. Last sentence: "may be the empty string" -- huh? 4.1.2 Syntax for Literals What's a "general syntax"? Is it different than a "syntax"? 4.1.3 Syntax for Query Variables "...does not form part of the variable name" -- better: "...is not part of the variable name..." 4.1.4 Syntax for Blank Nodes "The same blank node labels may not be used in two separate basic graph patterns." -- Surely, even in informal, commentary style pseudo- spec'ese, this should be "must", not "may". And shouldn't it read "two or more"? *May* one use the same blank node label in *three* separate basic graph patterns? In 5? 4.2.3 RDF Collections I find this entire section very confusing. I can't tell what's "allocated": blank nodes or triple patterns or both. "These allocated blank nodes allocated do not occur elsewhere in the query." -- "allocated" should be dropped, but I'm not sure which one... And "These allocated blank nodes..." is vague. Which ones? "...is short for:" -- does that mean "is equivalent to" or something else? 4.3 Syntax for Constraints Is this ready for LC? 5 Graph Patterns "SPARQL is based around graph pattern matching." -- this is the 3rd or 4th similar sentence, spread across the doc, and each one is *slightly* different. Is there some significance to the differences? Is it really necessary to keep repeating the point? I think that confuses spec readers. It confuses me, anyway. 5.1 Basic Graph Patterns "SPARQL pattern matching is defined in terms matching basic graph patterns..." -- "of" missing? Also, what kind of "SPARQL pattern matching"? Triple? Graph? And where is this defined precisely? Can we get a link? A pointer or reference? "Filters can be mixed into...but do not cause the end of a basic graph pattern." -- what is the "end" of a basic graph pattern? 5.1.1 Blank Node Labels s/"Labels"/"labels"/ This section refers to a "syntax error" -- which one? How's it spelled? Is this a generic syntax error or a specific one? Confusing. 5.1.2 Extending Basic Graph Pattern Matching "SPARQL is defined for matching RDF graphs with simple entailment." -- this is ambiguous: "simple entailment" is part of the mechanism of matching? Or matching works in the presence of simple entailment? Something else? If this is all that's intended to be said in this section, strike it! 5.2 Group Graph Patterns "In a SPARQL query string..." -- what's this? I think it's the first use of this wording. It's different than other wordings, so I'm left to do the boring, tedious interpretive work of trying to decide if it's a new construct or "informal" language. Can't we just stick to the same terms? I see in the Grammar section there is a "SPARQL query string" and a "SPARQL Query String" -- are these the same? If so, are they the same as the "SPARQL Query String" in 5.2? Can't we get some hyperlinks or pointers? Third sentence: run-on. 5.4 Examples Another hyphen/dash problem. "...the filter does not break the basic graph pattern into two pieces" -- huh? Pieces of what? 5.5 Scope of Filters "A constraint, expressed by the syntax keyword FILTER, is a restriction on solution over the while group in which the filter appears" -- I assume this is supposed to be "...a restriction on a solution over the whole group...". "syntax keyword FILTER" is awkward. I don't remember "keyword" being defined in this doc. I can't find a list of reserved words. 6 Including Optional Values Strike the first sentence; we don't need commentary in the spec on why some feature of the language is useful. That's the point of referring to UC&R at the outset. Much of this paragraph is commentary and should be struck. 6.1 Optional Pattern Matching Semicolon in sentence that starts "In an optional match,..." should be a comma. "It is unbound" -- what is? Pronouns starting sentences in specs are almost always a "code smell". They create the possibility of ambiguity and are best avoided. Which is easy to do if you've defined a bunch of terms formally. 6.4 Nested Optional Graph Patterns "The outer optional graph pattern must match for any nested optional pattern to be matched." -- which one? The outer*most* must match? The nearest outer must match? All outers must match? 7 Matching Alternatives "The UNION keyword is the syntax for pattern alternatives." -- I find all such constructions (and there several of them) to be awkward. Better: "Pattern alternatives are created with UNION" or some such. Saying "keyword" after every keyword is not as good as having a list of keywords. And the typographic change indicates that it's keyword anyway, or would if there were a guide to typographic conventions employed in the spec -- another best practice we seem to have jettisoned. "If the application wishes to know how exactly the information was recorded..." -- this is awkward. Better: "To determine exactly how the information was recorded..." "The UNION operator..." -- it's an operator and a keyword? Typography suggest yes. I don't see UNION in the list of operators... Confusing, especially since it's also called a "pattern" nearby. So that's UNION keyword, pattern, and operator. None of which is defined precisely in context, nor linked to a precise definition. "Query results of GP1 UNION GP2..." -- what are GP1 and GP2? 8 RDF Dataset I find much of this to be unnecessary (unwanted, really) commentary that could be struck, resulting in a shorter, better spec. The first sentence, at least, is also redundant. "Many RDF data stores hold multiple..." -- so what? What status does the talk of arranging provenance information, say, in the default graph have? That's *one* design pattern, but there are others. It sounds like it should have some normative weight, and it certainly does if anything else in the document has any. In the 2nd paragraph, "...each identified by IRI." -- The same IRI or different ones? This entire sentence is confusing. And these seem contradictory: First, "There may be no named graphs; there is always a default graph"; and, second, "A query does not need to involve the default graph..." This is all confusing and needs to be reworked, IMO. 8.1 Examples of RDF Datasets This section should be struck entirely. 8.2 Specifying RDF Datasets "A query processor may use these IRIs in any way..." -- Which IRIs? 8.2.1 Specifying the Default Graph "This does not put the graph in as a named graph; a query can do this by also specifying..." -- Multiple ambiguities: What does not put the graph in? What does "put the graph in" mean? Put it into the dataset? A query can do *what* by also specifying? "If a query provides more than one FROM clause..." -- sentence is awkward. 8.2.2 Specifying Named Graphs "Each IRI is used to provide one..." -- provide? What does that mean here? Oh, and we get another language-sounding construct hereabouts... the "clause"... How does that related to an operator, keyword, or pattern? Does it not relate? I'm confused. 8.3 Querying the Dataset "This is by either using an IRI..." -- huh? 8.3.1 Accessing Graph Names "The query below matches the graph pattern on each of the named graphs in the dataset..." -- I don't know what "the graph pattern *on each of the named graphs*" means. Sentence is confusing. 8.3.3 Restricting possible Graph IRIs -- s/Possible/possible/ "A variable used in the GRAPH clause may also be used in another GRAPH clause..." -- and? Does this mean they're the same variable? "This can be used to find information..." -- Antecedent of this is...? 8.3.4 Named and Default Graphs "The default graph is being used to record the provenance information..." -- Is this normative? Informative? Formal? Informal? Since it's the only example of graph relations used, it will seem endorsed or a best practice. I don't think that's appropriate here. 9 Solution Sequence and Modifiers "Modifiers are applied in the order given by the list." -- What list? Does this mean modifiers *must* be applied in some order? As written, I don't think it says that. 9.1 ORDER BY The way ORDER BY is described, it sounds like some kind of function. The syntax of ASC and DESC *looks* like function syntax. Why? Isn't the relationship between ORDER BY and some variable (?name) the same as the relation between DESC or ASC and some variable? I expected ORDER_BY(?name) DESC(?emp); or ORDER BY ?name DESC ?emp; but not ORDER BY ?name DESC(?emp). 9.3 DISTINCT "The solution sequence can be modified...in the sequence is unique" -- by which standard of identity is this to be determined? 9.4 OFFSET "OFFSET causes the solutions generated to start after the specified number of solutions" -- this is really awkwardly worded. In the next sentence, I don't understand what work "initially" is supposed to do. I think there is a better word than "predictable"; perhaps "stable" or "deterministic" or some such. Better to just say that LIMIT/OFFSET should be used with ORDER BY to be useful. 9.5 LIMIT Another variant for describing syntax: "The LIMIT form..." Is "form" a special term here? 10 Query Result Forms This is confusing. We originally called these "query forms" and we have "result forms". "Query result forms" is just confusing. I suggest we revert to "Query Forms". Also, for my $$, this is just in the wrong place and contributes to the sense that there really *is* no real organizational scheme to the "informal" presentation of the language. The query forms section should be at or near the *beginning* of the document's "informal" section. FWIW, I've read several blog comments recently to the effect of "I didn't know CONSTRUCT was in SPARQL" -- that it's tacked on at the end can't help by contribute to that fact. 10.1 Selecting Variables "Results can be thought of as a table or result set..." -- First, this is just really awkward. They can be thought of as lumps of blue cheese floating in the ether. What matters is what they *are*. Second, this is redundant; we've already had a description of the tabular presentation style of result sets in this document. 10.2 Constructing an Output Graph "...substituting for the variables into the graph template" -- s/in/ into/ Next paragraph: drop the parens; replace the "(" with a comma. Last sentence in that paragraph is a run-on. 10.2.2 Accessing Graphs in the RDF Dataset "Using CONSTRUCT it is possible" -- add a comma after CONSTRUCT. And drop the parens, replace w/ commas. 10.2.3 Solution Modifiers and CONSTRUCT "2" should be spelled out, "two". 10.3 Descriptions of Resources (Non-normative) What's a "Non-normative Resource"? Or, rather, what are "Non- normative Descriptions of Resources"? This is ambiguous in at least two dimensions: is this supposed to indicate that this *section* is not normative? If not, which of the two aforementioned readings is intended? If this section is "non-normative", that means that the entire remainder of the document *is* normative, including all the commentary and design discussion. Or something... Oh, and which part is meant as "non-normative"? 10.3? 10.3.* This really needs to be sorted out *before* LC. "Current conventions for DESCRIBE return an RDF graph without any specified constraints" -- what does that mean? It's completely opaque IMO. "As with any query, a service may refuse to serve a DESCRIBE query"... What's a service? If this is meant to allude to some protocol thing, why not have a link or pointer to that thing? I guess the protocol doc is a "companion", but one that this doc can't talk about? :> What's a "knowledge base"? What's a "target knowledge base"? What's a "SPARQL query processor"? Is that different than the "service"? 10.3.3 Descriptions of Resources The commentary and design discussions should be dropped. How about we just say "DESCRIBE is intentionally unspecified" and leave it at that? Also, I object to CBD being referenced under the rubric of "other possible mechanisms"... Either list others or drop this one. CBD has no special status or interest that I'm aware of. And it's been criticized, so it's not "the thing everyone does". 10.4 Asking "yes or no" questions This section title is awkward. It's not capitalized like any other section head, and it's not clear what a "yes or no" question is... 10.1, 10.2, 10.3, and 10.4 should be titled SELECT, CONSTRUCT, DESCRIBE, and ASK respectively. [I'm skipping from 11 to...] B. Conformance I don't think this section is sufficient. There's a lot of talk in the doc about error conditions, warnings, and lots of mays and musts -- none of that is covered by the grammar or result forms conformance stuff, nor is it covered in the protocol spec. I think this is a problem and will hurt interoperability. "See those specifications for their conformance criteria" -- how about a link? http://www.w3.org/TR/rdf-sparql-protocol/#conformance Finally, the sentence starting "Note that the SPARQL protocol describes" should be struck. Any such commentary or note doesn't belong in the query language spec at all, IMO, and certainly not in the section on conformance. It sticks out like a sore thumb. If there is interest in a statement like this in the protocol spec, that should be handled in the normal process for the WG. In fact, #4 in the protocol conformance section already says that, so this statement is also redundant and further muddies the normative status of the query spec... D. Collected Formal Definitions "The collected formal definitions are collected..." E. Internet Media Type, File Extension and Macintosh File Type (Normative) So *this* is the only normative part of the spec? Oh, except for the Normative References part of F. References. That's...odd. Cheers, Kendall
Received on Monday, 26 February 2007 23:21:34 UTC