Re: Review of "rq24" reorg. of SPARQL Query Language for RDF (part 2) from Seaborne, Andy on 2006-09-12 (public-rdf-dawg@w3.org from July to September 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 12 Sep 2006 13:28:01 +0100
To: Lee Feigenbaum <feigenbl@us.ibm.com>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <4506A7D1.6090203@hp.com>
Lee Feigenbaum wrote:
> This is an early review of the reorganization of the SPARQL Query
> Language for RDF specification known as rq24. I've divided the review
> into comments on the overall structure and presentation of the document,
> specific editorial comments on content in the document, and
> layout/rendering nits. (Admittedly, some of the distinctions are a bit
> arbitrary.) I have not attempted to review rq24 with respect
> to substantive issues currently facing the working group, or as to the
> correctness of the formal definitions. I have also not yet reviewed
> section 11 Testing Values or the appendices.
> 
> In this note I present the editorial comments on content in the document.
> 
> Editorial:
> 
> + Abstract. The abstract is not an abstract. The text provides a bit of
> background material and perhaps a one-sentence summary of what the
> SPARQL query language is. I'd suggest something like:
> 
> """ This document describes the query language part of the SPARQL
> Protocol And RDF Query Language for easy access to RDF stores. It is
> designed to meet the requirements and design objectives described in RDF
> Data Access Use Cases and Requirements [UCNR] The SPARQL query language
> consists of the syntax and semantics for asking and answering queries
> against RDF graphs. SPARQL contains capabilities for querying triple
> patterns, conjunctions, disjunctions, and optional patterns. It also
> supports constraining queries by source RDF graph and extensible value
> testing. Results of SPARQL queries can be ordered, limited and offset in
> number, and presented in several different forms. """

It's better than we have at present so I've included it.  And noted it needs 
revising when the rest of the doc is done.

> 
> 
> + 1.1.1 Namespaces. I think the prefixes in the table should include
> colons. (Ex. "rdf:" rather than "rdf"). This facilitates searching for
> the prefix declarations.

Done.

> 
> 
> + 1.1.2 Data Descriptions. Our reference for Turtle is to a document
> under /2001/sw/DataAccess which is basically a pointer to
> http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/ . Should we update
> the reference to point to this document, or is the indirect reference
> good enough?

We had hoped to be able to reference a submission by REC.

I think that http://www.dajobe.org/2004/01/turtle/ is the current definitive 
reference. (4 April 2006)

http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/ gets me this this by 
redirection.

As DAWG isn't working on this without DaveB, I've changed the links to 
http://www.dajobe.org/2004/01/turtle/

> 
> 
> + 2 Making Simple Queries. This whole section talks about matching graph
> patterns. I tend to think for a mini-primer this is OK, but it is at
> odds with the formal definition which is now based on entailment. 

It should be the case that graph matching and entailment, for the levels 
covered by SPARQL, are the same (to within bNode isomorphism).

http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0087
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0088
and messages around there.

> + 2.2 Multiple Matches. "The results enumerate the RDF terms to which
> the selected variables can be bound in the query pattern." As written,
> this sentence seems to indicate that there are no other RDF terms to
> which the variables could possibly be found. Perhaps it should be
> qualified along the lines of:
> 
> "The results enumerate the RDF terms to which the selected variables can
> be bound in the query pattern in order to match triples in the data."

Done.

> 
> 
> + 2.3.3 Matching Language Tags. The first query (with no solutions)
> should have an empty solution set depicted underneath it for
> completeness.

Done (but it looks a bit odd so maybe it isn't helpful).

> 
> 
> + 2.4 Value Constraints. "It is possible to further restrict solutions
> by constraining the allowable bindings of variables to RDF Terms." I'd
> suggest removing "to RDF Terms" or rewriting as It is possible to
> further restrict solutions by constraining the allowable RDF Terms to
> which variables can be bound."

Is this better:
"""
It is possible to further restrict solutions by constraining the RDF terms 
that can be used as bindings of variables.
"""

> 
> 
> + 2.4.1 Restricting the values of strings. I find the text here
> confusing. Some suggestions:
> 
> """ 
> One way to restrict the possible RDF literals is to use a regular
> expression with the regex  operator.
> """
> -->
> """
> Variable bindings to RDF literals can be restricted to strings matching
> a regular expression by using the regex operator.
> """
> 
> """
> Only plain literals with no language tag and XSD strings are matched by
> regex but it is possible to get the lexical form of a literal using str.
> """
> -->
> """
> The regex operator only matches <code>xsd:string</code> typed
> literals or plain literals with no language tag. regex can match against
> the lexical forms of other literlas by using the str operator.
> """

"""
Variable bindings can be restricted to strings matching a regular expression 
by using the regex operator. Only plain literals with no language tag and XSD 
strings are matched by regex. regex can be used to match the lexical forms of 
other literals by using the str operator.
"""


> 
> """
> which may be made case-insensitive with the "i" flag.
> """
> -->
> """
> Regular expression matches may be made case-insensitive with the "i"
> flag.
> """

Done

> 
> 
> + 2.4.2 Restricting the values of numbers. I don't think we ever refer
> to the "presentation" of a literal somewhere else. I suggest:
> 
> """
> Filters apply to the value of the literal, not its lexical form.
> """

Done

> 
> In general the text refers to variables directly by name, without
> quotation marks, so <code>"price"</code> should be simply
> <code>price</code>.

> 
> I'd suggest:
> 
> """ 
> By contraining the <code>price</code> variable, only <code>book2</code>
> matches the query because only <code>book2</code> has a price less than
> <code>30.5</code>, as the filter condition requires.
> """

Done.

> 
> + 2.6 Querying Reification Vocabulary. I think it might be worth a note
> that says that SPARQL does not treat the reification vocabulary terms
> specially. Something like:
> 
> """
> Note that SPARQL does not treat querying reified data any differently
> from any other data. As with other data, SPARQL can be used to query
> graph-pattern matches using the reification vocabulary.
> """

"""
2.6 Querying Reification Vocabulary
RDF defines a reification vocabulary which provides for describing RDF 
statements without stating them. These descriptions of statements can be 
queried by using the defined vocabulary. SPARQL does not treat querying 
reified data differently from any other RDF data. SPARQL can be used to query 
graph-pattern matches using the reification vocabulary.
"""

> + 3.1.1 Syntax for IRIs. 
> 
> """
> Prefixed names
> 
> The PREFIX keyword associates a prefix label with an IRI. A prefixed
> name is a prefix label and a local part, separated by a colon ":". It is
> mapped to an IRI by concatenating the local part to the IRI
> corresponding to the prefix.
> """
> 
> I think it's worth adding "(possibly empty)" before "prefix label". I
> think that "prefix" at the end of this paragraph should be "prefix
> label."

Added ""The prefix label may be the empty string.""

> 
> Later in this section, three examples of different wayts to write the
> same IRI are given. In the BASE and PREFIX cases, I'd suggest adding a
> "..." line in between the PREFIX/BASE clause and the IRI reference, to
> emphasize that this is simply an excerpt of SPARQL using these
> abbreviation mechanisms.

They are in green to indciate they are not queries.  It says "The following 
fragments ..."

> 
> + 3.1.2 Syntax for Literals. The introductory text and bulleted examples
> should discuss the triple qotation mark version of literals. Perhaps
> text like:
> 
> """
> To facilitate writing literal values which themselves contain quotation
> marks, SPARQL provides an additional quoting construct in which literals
> are enclosed in three single- or double-quotation marks.
> """

Added as a para in 3.1.2


> And then an example such as:
> 
> """The librarian said, "Perhaps you would enjoy 'War and Peace.'""""

Added:
"""The librarian said, "Perhaps you would enjoy 'War and Peace'.""""

> 
> + 3.1.4 Syntax for Blank Nodes. I feel that this section is a bit
> confused between whether it wants to define the syntax in terms of blank
> node labels only (leaving the mapping between labels and blank nodes to
> elsewhere in the spec), or in terms of blank nodes themselves. (I think
> someone (FredZ?) suggested a reworking of some of the BGP-matching
> definitions that assumed that we only worked with blank nodes there,
> which would require that this section fully explains how to map from
> syntactic constructs to blank nodes. (But that would be difficult since
> at this point the concept of a BGP has not yet been introduced.))
> 
> I'd be glad to take a stab at rewriting some of the text here to
> explicitly map only from syntactic constructs to blank node labels at
> this point in the spec if that would be helpful. If we went this route,
> I think that 5.4 Basic Graph Patterns in the SPARQL Syntax might be an
> appropriate place to include text explaining how blank node labels map
> to blank nodes.

Good points.

There is an @@ to emphasis that blank node labels are syntactic.

I'm not going to rework text here, especially to make to work with text in 
section 5, until section 5 is revised.  The time is better spent elsewhere for 
now.

> 
> + 3.2.2 Object Lists. This section includes the sentence:
> 
> """
> Note that both the triple patterns involving foaf:nick will need to
> match, not that one or the other should match.
> """
> 
> I'd suggest removing this sentence. This section of the document is
> purely syntactic in nature, and this sentence bleedsinto the territory
> of matching triple patterns, which has not been introduced yet.

Done.

> 
> + 3.2.3 RDF Collections.
> 
> """
> RDF collections can be written in triple patterns using the syntax "(
> )". The form () is an alternative for the IRI rdf:nil which is
> http://www.w3.org/1999/02/22-rdf-syntax-ns#nil. When used with
> collection elements, such as (1 ?x 3 4), triple patterns and blank nodes
> are allocated for the collection and the blank node at the head of the
> collection can be used as a subject or object in other triple patterns.
> """
> 
> First, we've already defined the rdf: prefix for the extent of this
> document, so I think including the full IRI is unnecessary here. 

I wanted to emphasis that its the full form that matters.  And you can click 
on it.

> 
> Second, as with 3.1.4, I think this section should be worded in terms of
> allocating blank node *labels* that do not otherwise appear in the
> query. This maintains a clean separation between the syntactic concerns
> of section 3 and the semantic concerns of most of the rest of the
> document.

It really is blank nodes being allocated.  RDF collections do not use blank 
node labels.  The bNodes exist after the syntax, including all blank node 
labels, has been stripped away by parsing.

Added:
"""
Blank nodes allocated do not occur else in the query.
"""

Changed:
"""
(1 ?x 3 4) :p "w" .

is a short form for (the blank node labels do not occur anywhere else in the 
query):
     _:b0  rdf:first  1 ;
           rdf:rest   _:b1 .
     _:b1  rdf:first  ?x ;
           rdf:rest   _:b2 .
     _:b2  rdf:first  3 ;
           rdf:rest   _:b3 .
     _:b3  rdf:first  4 ;
           rdf:rest   rdf:nil .
     _:b0  :p         "w" .
"""

> 
> + 4 Initial Definitions. "RDF Concepts and Abstract Syntax "anticipates
> an RFC on Internationalized Resource Identifiers. Implementations may
> issue warnings concerning the use of RDF URI References that do not
> conform with [IRI draft] or its successors."" That sentence seems out of
> the blue to me. It could use some motivation.
> 

"""
SPARQL is defined in terms of IRIs. RDF Concepts and Abstract Syntax 
"anticipates ...
"""

> + 4.1 RDF Terms. 
>  
> Why does the word "updated" link to the section in RDF Concepts about
> URI refs?

Because RDF-core uses "RDF URI References" and SPARQL uses IRIs.

> 
> IRIs include URIs [RFC3986] and URLs." Don't IRIs include URLs simply by
> virtue of URLs being a subset of URIs? (There's actually at least one
> other place in the document where I noticed this, but didn't comment on
> it.)

> + 4.2 Triple Patterns. "Any SPARQL triple pattern with a literal as
> subject will fail to match on any RDF graph." While this is true, it's
> really a consequence of how matches are defined, which we haven't seen
> yet. I'd ether remove this sentence, or at least change it to say
> "Because RDF graphs may not contain literal subjects, any
> SPARQL triple pattern with a literal as a subject will fail to match any
> RDF graph."

Added the initial clause to that sentence

> 
> + 4.4 Value Constraints. BOUND is a special case here, which doesn't fit
> into what's described here. (Because it acts on the variable, not on a
> value or an RDF term.) Perhaps it should be explicitly mentioned?

The definition is:

"""
A value constraint is a boolean-valued expression of variables and RDF Terms.
"""

so it's covered.  I don't see the point in BOUND specially.  No other operator 
is discussed specifically.

> 
> + 5.3 Examples of Basic Graph Pattern Matching. This contains the text:
> 
> """
> There is a blank node [CONCEPTS] in this dataset, identified by _:a. The
> label is only used within the file for encoding purposes. The label
> information is not in the RDF graph.
> """
> 
> Thisis superfluous with explanations in section 3. I think these
> sentences should be removed.

In order to have examples, I think it is worth emphasising - I'm sure some 
readers will go straight to this section sometimes.

> 
> + 6 Group Graph Patterns. I agree with the @@ in the document that the
> summary of graph-pattern types at the beginning of this section can be
> removed now that it is basically repeated in 4 Initial Definitions.
> 
> + 6.1 Group Graph Patterns. It would be nice if the definition of Group
> Graph Pattern used the abbreviation GGP instead of GP which is usually
> used for graph patterns (that are not necessarily group graph patterns).

Done.

> 
> """
> For any solution, the same variable is given the same value everywhere
> in the set of graph patterns making up the group graph pattern. For
> example, this query has a group graph pattern of one basic graph pattern
> as the query pattern.
> 
> In a SPARQL query string, a group graph pattern is delimited with
> braces: {}. 
> """
> 
> I think that the middle sentence belongs at the end and can be
> clarified. Perhaps:
> 
> """
> For any solution, the same variable is given the same value everywhere
> in the set of graph patterns making up the group graph pattern. 
> 
> In a SPARQL query string, a group graph pattern is delimited with
> braces: {}. For example, the query pattern for this query is a single
> group graph pattern. This group graph pattern contains a single basic
> graph pattern, which in turn contains two triple patterns.
> """

Moved

> + 9 RDF Dataset. s/comprises of/comprises/ (in American English, at
> least :-). 

Done.

> 
> + 10.1 Solution Sequences and Result Forms. It'd be nice if something
> here linked back to the definition of a pattern solution from section 4,
> perhaps around the phrase "each solution being a function from variables
> to RDF terms."
> 
> + 10.1.3 DISTINCT. I'd suggest adding a sentence to the effect that
> the DISTINCT keyword/modifier can only be used with the SELECT result
> form.

Link added.

> + 10.2 Selecting Variables. I'd prefer something like "Selecting Variable
> Bindings," and a similar change to the first sentence: "The SELECT form
> of results returns the variables directly."

Doesn't work for me - it's not selecting bindings, it's choosing which 
variable are projected into the results.  It is the variable that matters, not 
its binding in any given solution.  It is by variable name as well.

> 
> + 10.2 Selecting Variables.
> 
> """ 
> Result sets can be accessed by the local API but also can be
> serialized into either XML or an RDF graph.
> """
> 
> Results *can* be serialized in any number of other ways, also. I think
> that "the local API" is confusing since there's no other reference to
> such a creature. Maybe "a local API" is better, or no mention at all. I
> think just sayiing that SPARQL Query Results XML Format provides one
> serialization of SELECT results in an XML vocabulary would suffice and
> be less confusing.

The point is that results do not have to be serialized at all.

s/the local API/a local API/

> 
> """
> The syntax SELECT * is an abbreviation that selects all of the
> variables.
> """
> --->
> """
> The syntax SELECT * is an abbreviation that selects all of the
> variables that appear in the query.
> """

Done

> 
> + 10.3 Constructing an Output Graph. Since the section before talks
> about a serialization of the results, I wonder if this section should
> have something to say about the SPARQL query language specification not
> constraining the serialization of the graph resulting from a CONSTRUCT
> query. Perhaps a pointer to the appropriate part of the protocol
> document? Similarly for DESCRIBE in 10.4.

The serialization would be an RDF graph so serialization would point to RDF 
Syntax.  It's because the SPARQL XML Results doc is part of the SPARQL suite 
(and is new) that it gets mentioned in SELECT.

(I don't where in the protocol document to link to - 2.1.3 maybe? - it does 
not an anchor point)

> 
> + 10.4 Description of resources. This section should say something about
> DESCRIBE *, along the lines of what is said for SELECT *.

Done

> + 10.4.2 Identifying Resources. This says: "If, however, the query
> pattern has multiple solutions, the RDF data for each is the union of
> all RDF graph descriptions." I know that DESCRIBE is underspecified,
> but wonder if it would be safer to say "merge" here rather than "union"?
> Or perhaps "union" is purposeful here to allow descriptions of different
> terms to share blank nodes?

It's probably not relevant but the use of 'union' is because the description 
is viewed as a fragment of an RDF graph, not a standalone graph itself.  A 
blank node in a description fragment, may also occur in another fragment and 
be retained.

	Again - thanks for the review,
	Andy

> 
> 
> Lee
Received on Tuesday, 12 September 2006 12:28:39 UTC