Comments on rq24 reorganization

This is my comparison of the relative merits of the current rq23  
specification [1] and its reorganized rq24 variant [2].  It's  
somewhat different from Lee Feigenbaum's previous series of messages  
[3][4][5] which were instead a review of rq24 itself.  What I've  
tried to do is describe what's moved where, and how I feel about each  
of the changes.

I'll start with the conclusion.  At a structural level, rq24 is  
definitely an improvement on rq23, and is a better way forward.  I  
don't doubt that we should adopt rq24 over rq23.  The only question  
in my mind is when.  The reorganization process has left us with  
areas in which rq24's narrative lapses entirely into @@Todo notes,  
whereas rq23 is rambly but at least complete.  For this reason, I'd  
hesitate to say at this moment that rq24 is actually a better  
document, in the sense of being the best exposition of SPARQL for the  
public  It certainly doesn't strike me as CR-quality.

The key issue seems to be whether it's more important to give the  
public the most polished account of SPARQL (rq23) or the most up-to- 
date account of its evolution (rq24).  I favor choosing the latter  
and going with rq24 because I believe that the holes in it won't take  
too long to patch, eliminating the only reason I see for sticking  
with rq23.  My ideal scenario would be for one more editoral pass to  
take place before adopting rq24.  At the very least, I think that  
sections 1.2 and 2.5 should be either filled out or commented out,  
and that the internal links broken in the course of the  
reorganization should be mechanically checked and then corrected [6] 
[7].  However, if the document were to drop back to WD status I feel  
that immediately adopting rq24 as-is would also be acceptable.

-----

Now, on to the dull details of how I actually did the comparison.  I  
drew myself a diagram of how the sections in rq23 were permuted to  
obtain rq24.  I tried to group and summarize the changes at a  
meaningful level, trying to reverse engineer the editors' intentions  
for the reorganization.

* Text from the beginning of rq23.2 moved to rq24.1.2 (Document  
Outline).  Having an outline is a good idea, and I agree with Lee  
that it belongs before the section on Document Conventions.  The only  
issue I take here is that another few sentences need to written to  
fill out the new section and (trivially) the numbering needs to be  
corrected.

* The previous rq23.2 (Making Simple Queries) and rq23.3 (Working  
with RDF Literals) received the brunt of the changes, reorganized  
into four new chapters.

* The new rq24.2 provides a series of example queries without  
detailed specification.  For the most most part this involved  
migrating specification text elsewhere (rq23.2.1.1 through rq23.2.8)  
and gathering the examples from rq23.3.1 and rq23.3.2.  I think the  
idea of tossing a bunch of examples at the reader right at the outset  
does a lot to frame the specifications that follow later.  O'Reilly  
built of publishing empire on this editorial principle.  :)

* The concrete syntax of triple patterns is then specified in rq24.3,  
with material drawn from rq23.2.1.1-4, and the abstract syntax and  
semantics  in rq24.4 with material from rq23.2.2-4 and  
rq23.2.8.4-5.    I personally don't think dividing the abstract and  
concrete syntax into two chapters is helpful; I'd prefer to see these  
two interspersed, with each individual feature's concrete and  
abstract syntax treated together.  Without any concrete syntax  
examples, the entirety of rq24.4 is rather difficult to understand.   
I'd suggest a better division would be to put the concrete and  
abstract syntax details (currently rq24.3 through rq24.4.2) into  
chapter 3, and the semantics of what qualifies as a solution to a  
pattern (currently rq24.4.3-5) into chapter 4.

* rq23.2.5 (Basic Graph Patterns) becomes its own chapter rq24.5.   
The semantics of what qualifies as a graph solution is a pretty core  
topic and certainly deserves its own chapter.  I suspect there'd be  
benefit in drawing in material from rq24.4.3-5 on triple pattern  
solutions into this chapter, so that the entire process of filtered  
basic graph pattern is in one spot.

* I didn't note any substantial change to chapters rq23.4-6, other  
than being renumbered as rq24.6-8.

* The chapters on datasets rq23.7,8,9 have been consolidated as  
rq24.9 (RDF Dataset). Putting all the stuff about querying multiple  
graphs at once into the one chapter seems a definite improvement.

* The preamble to Appendix A (before the EBNF meat of it, in A.7) has  
been reorganized.  This seems to be a modest improvement; I don't  
have any strong opinions about it.

-----

My goal was to compare rq23 and rq24 rather than to proofread either  
of them, but some collateral proofreading happened anyway and might  
as well be captured:

* Section rq24.1.2 appears out of sequence.

* Section rq24.2.2 concludes with the statement "all the variables  
used in the query pattern must be bound in every solution."  I'm not  
entirely certain this is correct; is a variable which has been  
projected away (in this particular case, ?x) still bound in that  
particular solution?

* There's a forward reference to CONSTRUCT at the beginning of  
rq24.2.7 which probably can't be avoided, but at least ought to be  
linked to rq24.10.3

* EBNF fragments of the grammar appear throughout rq24.3.  It would  
probably help to add a link to the format used for these (XML 1.1  
section 6) in rq24.1.1 (Document Conventions) rather than (or in  
addition to) the link at the beginning of rq24.A.7.  (On closer  
inspection, I see a @@ note in rq21.1.2 which indicates the editors  
have already thought of this, although not quite in the place I  
expected.)

* The link to #syntaxMisc in section rq24.3.2 is broken: "there are  
abbreviated ways of writing some common triple pattern constructs."   
It might helpful to add some text noting that these abbreviations are  
all adapted from Turtle, with a link.

* The grammar rules throughout of rq24.10 are empty.

* The comment just before rq24.A.1 "rules A.1 to A.5 apply" should  
probably include A.6 as well.

* A link in rq24.A.7 to #Keywords is now broken, since rq23.A.3 has  
been moved to the beginning of rq24.A.7: "Matching is case-sensitive  
except as noted above for keywords."

As a final miscellaneous observation, the section headings do not  
make it easy to look up a particular keyword.  It might be more  
useful to name rq24.3.1.1 as "PREFIX and BASE", rq24.7 as "OPTIONAL",  
rq24.9.2.1 as "FROM", rq24.10.3 as "CONSTRUCT" and so forth, or  
perhaps to use longer section titles combining the keyword and  
description, e.g. "Matching Alternatives using UNION".

The first paragraph of thet Abstract would fit better in rq24.1  
(Introduction).

-----

References:
[1] http://www.w3.org/2001/sw/DataAccess/rq23/ (revision 1.692)
[2] http://www.w3.org/2001/sw/DataAccess/rq23/rq24.html (revision 1.17)
[3] http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/ 
0107.html
[4] http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/ 
0108.html
[5] http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/ 
0109.html
[6] http://www.w3.org/2001/sw/DataAccess/rq23/rq24.html#Keywords
[7] http://www.w3.org/2001/sw/DataAccess/rq23/rq24.html#syntaxMisc

Received on Wednesday, 6 September 2006 13:57:49 UTC