Review of "rq24" reorg. of SPARQL Query Language for RDF (part 1) from Lee Feigenbaum on 2006-08-15 (public-rdf-dawg@w3.org from July to September 2006)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Tue, 15 Aug 2006 00:38:53 -0400
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <OFFE8CACB8.2746D1DE-ON852571CB.001900DA-852571CB.00198494@us.ibm.com>
This is an early review of the reorganization of the SPARQL Query
Language for RDF specification known as rq24. I've divided the review
into comments on the overall structure and presentation of the document,
specific editorial comments on content in the document, and
layout/rendering nits. (Admittedly, some of the distinctions are a bit
arbitrary.) I have not attempted to review rq24 with respect
to substantive issues currently facing the working group, or as to the
correctness of the formal definitions. I have also not yet reviewed
section 11 Testing Values or the appendices.

In this note I present the comments on the overall structure and 
presentation of the document. The other comments will follow in separate 
notes.

Structural and Presentation:

+ Grammar rules. I'm wondering if grammar rules excerpted throughout the
document should be every rule having to do with the topic or just a few
select rules that illustrate the relevant constructs. For example, in
section 3.1.1 Syntax for IRIs, the grammar rules included define a <...>
IRI ref and a QName. They don't, however, define the SPARQL PREFIX
clause or BASE clause, both of which are discussed in that section.
(Another example is 3.1.4 in which the rules for "[]" are included but
not the rules for "[:p :o]".)

+ "1.2 Document Outline" is currently before "1.1 Document Conventions".
I think this is the proper order of the two topics and that only the
numbering need be fixed.

+ 1.1.3 Result Descriptions. I think it would be good to tie the tabular
representation directly to some formal part of the spec. (Perhaps by
noting that a row in the table represents one solution from a solution
sequence, or perhaps indirectly by noting that the table is a visual
representation of the XML results form.) 

+ 2.2 Multiple Matches. I don't think we've seen blank node syntax
yet to this point in the specification.

+ 2.2 Multiple Matches. "This is a basic graph pattern match, and all
the variables used in the query pattern must be bound in every
solution." At the least, this should link to a formal definition of
basic graph pattern. At the most, this sentence should be removed as
being overly technical for the primer section.

+ 2 Making Simple Queries. If this section is intended to be a small
primer, I think it needs to be more comprehensive. It should include
introductory queries that use UNION, OPTIONAL, BOUND, and perhaps GRAPH.
It may also be the only place in the SPARQL document in which it would
be reasonable to include the OPTIONAL/!BOUND trick for querying
maximum/minimum values. (An example of this trick might also be
appropriate in section 7.3 or 7.5.)

+ 2.7 Blank Nodes in Query Results. With talk about the scoping set and
co-occurrences of blank nodes, this section does not belong in Section 2
of the larger document. A stripped down section might be appropriate,
but I think it would be better off in Section 10, Query Result Forms.

+ 3.2 Syntax for Triple Patterns. This section links to
http://www.w3.org/2001/sw/DataAccess/rq23/rq24.html#syntaxMisc for
abbreviations, but that internal anchor doesn't seem to exist. 

+ 3.2 Syntax for Triple Patterns. The entire introduction to this
section seems to be superfluous in light of the information and examples
regarding PREFIX and BASE and IRI references in 3.1.1 Syntax for IRIs.
Also, I don't see any reason to use the "$" variant to variable tokens
in these examples. I'd strike the entire introduction (everything before
3.2.1).

+ 4 Initial Definitions. I like the positioning of this section, but
some of the terms defined here (in particular RDF Term and maybe Query
Variable) are used in previous sections. Perhaps a forward reference
from somewhere near the beginning of 3.1 RDF Term would be appropriate.

+ 4.1 RDF Terms. I think that each definition here should be its own
subsection. That is:
  4.1 RDF Terms
  4.2 Query Variable (needs one introductory sentence as in "SPARQL
  semantics bind query variables to RDF Terms."
  4.3 Graph Pattern (needs one introductory sentence as in "SPARQL
  queries are made of one or more graph patterns."
  4.4 SPARQL Query (needs one introductory sentence as in "Formally, a
  SPARQL query contains four components:"

  Then 4.2 Triple Patterns becomes 4.5 Triple Patterns. However, I think
  Triple Patterns makes more sence after Query Variable and before Graph
  Pattern. 

+ 4.3 Pattern Solutions. This section ends with:

""" 
@@ Consider whether to have a "RDF dataset" section in "Initial
Definitions"

Graph patterns match against the default graph of an RDF dataset, except
for the RDF Dataset Graph Pattern. In this section, all matching is
described for a single graph, being the default graph of the RDF dataset
being queried. 
"""

I think that an RDF dataset definition here would be appropriate. I do
not understand what the rest of the text there is doing at this point in
the document.

+ 4.5 Matching Values and RDF D-entailment. This does not belong in the
Initial Definitions section. I'd prefer to see it as a subsection of 5.1
General Framework or of 5.2 SPARQL Basic Graph Pattern Matching.

+ 5 Basic Graph Patterns. It's unclear to me why the first three
definitions here are not part of 5.1 General Framework.

+ 5 Basic Graph Patterns. I'd like it if the definition of a BGP was
tied in some way to the grammar which parses as large a BGP as possible
when it encounters the first triple pattern in a BGP. (That is, some
text which clarified that there is only a single BGP in

{
   :x :p :q .
   :y :r :s .
}

+ 5.3 Examples of Basic Graph Pattern Matching. As it stands currently,
this section is barely more than the example queries from section 2. I
think that this section is important, but I think that it should take
this example and work through the E-entailment (simple-entailment,
actually) based definitions in detail to show how one arrives at the
expected solutions for the query. I'd be glad to try writing this text
up, if it would be helpful.

+ 6.3 Unbound variables. Should be removed as per the @@. It no longer
belongs here, and it is sufficiently covered by the definitions of
variable subtitutions and pattern solutions in section 4.

+ 7.4 Optional Matching - Formal Definition. I think this section should
be the first subsection of section 7. Also, the text about
left-associativty seems to belong more in this section then in the
expository text which currently makes up 7.1 Optional Pattern Matching.

+ 8.2 Union Matching - Formal Definition. As above, I think this section
should be the first subsection of section 8.

+ 9.2.2 Specifying Named Graphs and 9.2.3 Combining FROM and FROM NAMED.
The example given here makes its pointby using the GRAPH keyword which
has not yet been introduced. Two possible fixes:
 
  1) A non-query-based example here which simply shows a set of FROM
  NAMED clauses and then shows a representation of the RDF Dataset
  created from those clauses. (9.3.* shows plenty of examples of queries
  with the GRAPH keyword).

  2) Move the formal definition of GRAPH to early in this section (right
  after the RDF dataset formal definition), which would make this
  example more reasonable.

+ 10.2, 10.3, 10.4, 10.5. As above, I'd put the formal definitions first
in these sections, and follow that with the expository text.


Lee
Received on Tuesday, 15 August 2006 04:39:12 UTC