Re: Review of "rq24" reorg. of SPARQL Query Language for RDF (part 1)

Lee Feigenbaum wrote:
> This is an early review of the reorganization of the SPARQL Query
> Language for RDF specification known as rq24. I've divided the review
> into comments on the overall structure and presentation of the document,
> specific editorial comments on content in the document, and
> layout/rendering nits. (Admittedly, some of the distinctions are a bit
> arbitrary.) I have not attempted to review rq24 with respect
> to substantive issues currently facing the working group, or as to the
> correctness of the formal definitions. I have also not yet reviewed
> section 11 Testing Values or the appendices.
> 
> In this note I present the comments on the overall structure and 
> presentation of the document. The other comments will follow in separate 
> notes.
> 
> Structural and Presentation:
> 
> + Grammar rules. I'm wondering if grammar rules excerpted throughout the
> document should be every rule having to do with the topic or just a few
> select rules that illustrate the relevant constructs. For example, in
> section 3.1.1 Syntax for IRIs, the grammar rules included define a <...>
> IRI ref and a QName. They don't, however, define the SPARQL PREFIX
> clause or BASE clause, both of which are discussed in that section.
> (Another example is 3.1.4 in which the rules for "[]" are included but
> not the rules for "[:p :o]".)

I've added an @@ to do that when the grammar is stable again.  Until we have 
closed #punctuationSyntax, I not going to invest heavily in grammar extracts 
because they may all change is rule numbers change; and I don't want clarity 
in the grammar to be traded-off against reducing change in the grammar extracts.

> + "1.2 Document Outline" is currently before "1.1 Document Conventions".
> I think this is the proper order of the two topics and that only the
> numbering need be fixed.

Done.

> 
> + 1.1.3 Result Descriptions. I think it would be good to tie the tabular
> representation directly to some formal part of the spec. (Perhaps by
> noting that a row in the table represents one solution from a solution
> sequence, or perhaps indirectly by noting that the table is a visual
> representation of the XML results form.) 

Used the term "solution" and said it corresponds to a row.  Note that the 
description is "illustrative".

> 
> + 2.2 Multiple Matches. I don't think we've seen blank node syntax
> yet to this point in the specification.

Need to add something to the data conventations earlier on.  Noted as an @@

> 
> + 2.2 Multiple Matches. "This is a basic graph pattern match, and all
> the variables used in the query pattern must be bound in every
> solution." At the least, this should link to a formal definition of
> basic graph pattern. At the most, this sentence should be removed as
> being overly technical for the primer section.

Added the link but left the text there.  It's at the end of the section so 
seems OK and gives the link a raison d'être.

> 
> + 2 Making Simple Queries. If this section is intended to be a small
> primer, I think it needs to be more comprehensive. It should include
> introductory queries that use UNION, OPTIONAL, BOUND, and perhaps GRAPH.
> It may also be the only place in the SPARQL document in which it would
> be reasonable to include the OPTIONAL/!BOUND trick for querying
> maximum/minimum values. (An example of this trick might also be
> appropriate in section 7.3 or 7.5.)

1/ It would be good to at least mention UNION, OPTIONAL, GRAPH.  I've added an 
@@ as it needs more time than just making editorial fixes.

2/ I don't think section 2, or indeed the document as a whole, should mention 
OPTIONAL/!BOUND for maximum because it is better done by having grouping and 
max in the language.

> 
> + 2.7 Blank Nodes in Query Results. With talk about the scoping set and
> co-occurrences of blank nodes, this section does not belong in Section 2
> of the larger document. A stripped down section might be appropriate,
> but I think it would be better off in Section 10, Query Result Forms.

Good idea : added a @@ for this pass.

> 
> + 3.2 Syntax for Triple Patterns. This section links to
> http://www.w3.org/2001/sw/DataAccess/rq23/rq24.html#syntaxMisc for
> abbreviations, but that internal anchor doesn't seem to exist. 

Fixed. Link removed because it needs to point to two subsections.

> 
> + 3.2 Syntax for Triple Patterns. The entire introduction to this
> section seems to be superfluous in light of the information and examples
> regarding PREFIX and BASE and IRI references in 3.1.1 Syntax for IRIs.
> Also, I don't see any reason to use the "$" variant to variable tokens
> in these examples. I'd strike the entire introduction (everything before
> 3.2.1).

Removed the prefix text.  Left the examples - seems worth while giving 
complete, simple examples, and stressing that the same query does not mean 
exactly the same syntax.

> 
> + 4 Initial Definitions. I like the positioning of this section, but
> some of the terms defined here (in particular RDF Term and maybe Query
> Variable) are used in previous sections. Perhaps a forward reference
> from somewhere near the beginning of 3.1 RDF Term would be appropriate.



> 
> + 4.1 RDF Terms. I think that each definition here should be its own
> subsection. That is:
>   4.1 RDF Terms
>   4.2 Query Variable (needs one introductory sentence as in "SPARQL
>   semantics bind query variables to RDF Terms."
>   4.3 Graph Pattern (needs one introductory sentence as in "SPARQL
>   queries are made of one or more graph patterns."
>   4.4 SPARQL Query (needs one introductory sentence as in "Formally, a
>   SPARQL query contains four components:"
> 
>   Then 4.2 Triple Patterns becomes 4.5 Triple Patterns. However, I think
>   Triple Patterns makes more sence after Query Variable and before Graph
>   Pattern. 

Good suggestions: done.


> 
> + 4.3 Pattern Solutions. This section ends with:
> 
> """ 
> @@ Consider whether to have a "RDF dataset" section in "Initial
> Definitions"
> 
> Graph patterns match against the default graph of an RDF dataset, except
> for the RDF Dataset Graph Pattern. In this section, all matching is
> described for a single graph, being the default graph of the RDF dataset
> being queried. 
> """
> 
> I think that an RDF dataset definition here would be appropriate. I do
> not understand what the rest of the text there is doing at this point in
> the document.
> 

That'll need a bit more time to sort out between here and section 9.  A "ToDo".

> + 4.5 Matching Values and RDF D-entailment. This does not belong in the
> Initial Definitions section. I'd prefer to see it as a subsection of 5.1
> General Framework or of 5.2 SPARQL Basic Graph Pattern Matching.

Agreed - noted - depends on #nonLiteralValueTesting so I've left it for now 
but noted something needs to be done.

> 
> + 5 Basic Graph Patterns. It's unclear to me why the first three
> definitions here are not part of 5.1 General Framework.

There are some introductory defns needed later.  Maybe "General Framework" 
would be better as "General Basic Graph Pattern Matching".


> 
> + 5 Basic Graph Patterns. I'd like it if the definition of a BGP was
> tied in some way to the grammar which parses as large a BGP as possible
> when it encounters the first triple pattern in a BGP. (That is, some
> text which clarified that there is only a single BGP in
> 
> {
>    :x :p :q .
>    :y :r :s .
> }

Can do when the grammar is fixed.

> 
> + 5.3 Examples of Basic Graph Pattern Matching. As it stands currently,
> this section is barely more than the example queries from section 2. I
> think that this section is important, but I think that it should take
> this example and work through the E-entailment (simple-entailment,
> actually) based definitions in detail to show how one arrives at the
> expected solutions for the query. I'd be glad to try writing this text
> up, if it would be helpful.

I'd like to take you up on that offer of some text.  Maybe best after a 
revision of definitions to cover the spurious results when there is redundancy 
and bNodes.

> 
> + 6.3 Unbound variables. Should be removed as per the @@. It no longer
> belongs here, and it is sufficiently covered by the definitions of
> variable subtitutions and pattern solutions in section 4.

Done.

> 
> + 7.4 Optional Matching - Formal Definition. I think this section should
> be the first subsection of section 7. Also, the text about
> left-associativty seems to belong more in this section then in the
> expository text which currently makes up 7.1 Optional Pattern Matching.

I prefer to introduce optionals first, then define them.  Can reconsider after 
publication.  I'd liek to get the majority of your review done so some things 
will slip.

Move "left-associative" text to just after the grammar extract.

> 
> + 8.2 Union Matching - Formal Definition. As above, I think this section
> should be the first subsection of section 8.

As above.

> 
> + 9.2.2 Specifying Named Graphs and 9.2.3 Combining FROM and FROM NAMED.
> The example given here makes its pointby using the GRAPH keyword which
> has not yet been introduced. Two possible fixes:
>  
>   1) A non-query-based example here which simply shows a set of FROM
>   NAMED clauses and then shows a representation of the RDF Dataset
>   created from those clauses. (9.3.* shows plenty of examples of queries
>   with the GRAPH keyword).

Good way of doing it - done.

> 
>   2) Move the formal definition of GRAPH to early in this section (right
>   after the RDF dataset formal definition), which would make this
>   example more reasonable.

That gets into a bit of a circularity that rq24 addresses by defintion 
datasets first.

> 
> + 10.2, 10.3, 10.4, 10.5. As above, I'd put the formal definitions first
> in these sections, and follow that with the expository text.

Noted.

> 
> 
> Lee

Changes will be in CVS just after I can link to this message for the CVS log.

	Thanks
	Andy

Received on Monday, 11 September 2006 12:39:44 UTC