Re: feedback on "SPARQL Query Language for RDF", v1.139

Kevin,

Thank you for such a detailed set of comments, and thank you marking up the text.

Changes logged below.


Kevin Wilkinson wrote:
> attached are my comments on v1.139 of the SPARQL spec.
> i edited the html directly (removing sections that
> required no changes). my comments are in green font
> and are delimited by '-' and '+'.
> 
> kevin
> 
> 
> ------------------------------------------------------------------------
> 
> W3C <http://www.w3.org/>
> 
> 
>   SPARQL Query Language for RDF
> 
> Editors working draft.
>     Live Draft - version:
>     $Revision: 1.139 $ of $Date: 2004/11/22 16:52:02 $ 
> Latest published version:
>     First Working Draft
>     http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/ 
> 
> Comments from Kevin: comments/changes from Kevin Wilkinson are in green 
> font with deletions bracketed by '-' and additions bracketed by '+'. For 
> simple typos, I just indicate the change with '+' and don't show the 
> deletion.
> 

Changes in v1.141 until noted otherwise.

> 
>     1 Introduction
> 
> An RDF graph is a set of triples, each consisting of a /+subject+/, an 
> +object+, and +/predicate/ that specifies+ a property relationship 
> between them+,+ as defined in RDF Concepts and Abstract syntax 
> <http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Datatypes-intro>. 

A/ Can't see the change in subject and object.

B/ RDF Concepts says:

"""
6.1 RDF Triples

An RDF triple contains three components:

     * the subject, which is an RDF URI reference or a blank node
     * the predicate, which is an RDF URI reference
     * the object, which is an RDF URI reference, a literal or a blank node

An RDF triple is conventionally written in the order subject, predicate, object.

The predicate is also known as the property of the triple.
"""
so we should consistently use predicate or property.  Predicate seems to be 
slightly preferred by the concepts text so I have changed "property" in the 
document to "predicate".

Fixed the concepts URL which points to the wrong place in the doc.

> 
> 
> KW Comment: for consistency with the rest of this document, I added the 
> word "predicate" above. Use it  either in addition to or in place of 
> "property".
> 
> 
>     2 Making Simple Queries
> 
> Queries match graph patterns against the target graph of the query.  

The  are &nbsp;'s and have been removed.

> Patterns are like graphs but may +have+ named variables in place of some 
> of the nodes or predicates; the simplest graph patterns are single 
> triple patterns.  -and graph- +Graph+ patterns can be combined using 
> various operators into more complicated graph patterns. 

Fixed.

> 
> A /binding/ is a mapping from -the- a variable in a query to -terms- 
> +RDF terms (see Section 2.2)+. A /pattern solution/ is a set of bindings 
> which, when applied to the variables in the query, -cab-+can+ be used to 
> produce a subgraph of the target graph; /-query results /are/-/ +a / 
> query result /is+ a set of /pattern solutions/. If there are no -result 
> mappings-+pattern solutions+, the query results is an empty set. (KW 
> Comment: /result mappings/ is not defined at this point.)

Fixed : changed to "pattern solutions".

> 
> Pictorially, suppose we have a graph with two triples and the given 
> triple pattern:
> 
> _:1 foaf:mbox "alice@work.example"
> 
> triple1
> 
> _:2 foaf:mbox "robt@home.example"
> 
> triple2
> 
> ?who foaf:mbox ?addr
> 
> triplePattern1
> 
> with the result:
> 
> reference 	author
> http://www.w3.org/TR/xpath 	"James Clark"
> http://www.w3.org/TR/xpath 	"Steve DeRose"
> 
> RDF graphs are constructed from one or more triples, ex. graph1.
> 
> _:1 foaf:mbox "alice@work.example". _1 foaf:knows _2. _:2 foaf:mbox 
> "robt@home.example" <http://www.w3.org/2001/sw/DataAccess/rq23/graph1.svg>
> 
> graph1
> 
> ?who foaf:mbox "alice@work.example". ?who foaf:knows ?whom. ?whom 
> foaf:mbox ?address
> 
> graphPattern1
> 
> -A query for graphPattern1 will return the email address of people known 
> by Alice (specifically, the person with the mbox |alice@work.example|). 
> When matched against the example RDF graph, we get one result mapping 
> which binds three variables:-
> 
> (KW Comment: the above paragraph (1) makes no sense as there is no Alice 
> in graph1, (2) uses the phrase "result mapping" which has not been 
> defined. An attempted rewrite is below. Also, graph1 and graphPattern1 
> use the prefix "dc:" which is not defined.)

This appears to be old text from when the picturs were different.  They were 
changed so as not involve bNodes this early on.

Leave this to Eric.

> 
> +The query pattern graphPattern1 will return the URI (via /dc:relation/) 
> and authors (via /dc:creator/) for documents referenced by the (document 
> identified by the) bound variable / referrer/. When matched against 
> graph1, we get two pattern solutions which bind three variables:+
> 
> referrer 	reference 	author
> http://www.w3.org/TR/xpath 	http://www.w3.org/TR/xpath 	"James Clark"
> http://www.w3.org/TR/xpath 	http://www.w3.org/TR/xpath 	"Steve DeRose"
> 
> (KW Comment: in the above result, I think you want the referrer result 
> to be ~TR/xslt rather than ~TR/xpath.)

Agreed.  URIs quoted as well.

> 
> 
>       2.1 Writing a Simple Query
> 
> The example below shows a +SPARQL+ query to find the title of a book 

Done.

> from the information in an RDF graph. The query consists of two parts, 
> the SELECT clause and the WHERE clause. Here, the SELECT clause names 
> the variable of interest to the application, and the WHERE clause has 
> one triple pattern.
> 
> Data:
> 
> <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "SPARQL Tutorial" . 
> 
> Query:
> 
> SELECT ?title
> WHERE  ( <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title )
> 
> Query Result:
> 
> title
> "SPARQL Tutorial"
> 
> The terms delimited by "<>" are URI References [13] <#ref13> (URIRefs); 
> URIRefs can also abbreviated with an XML QName-like form [14] <#ref14>; 
> this is syntactic assistance and is translated to the full URIRef. 
> -Other RDF terms-+The terms delimited by double quotes+ are literals 
> which, following N-Triples syntax [7] <#ref7>, are a string and 
> -optional language tag (introduced with '@') and datatype URIRef 
> (introduced by '^^')-+optionally, either a language tag (indicated by 
> '@') or a datatype URIRef (indicated by '^^')+.

Changed to:
"""
The RDF terms delimited by double quotes ("") are literals which, following 
N-Triples syntax [7], are a string, in quotes, an optional language tag, 
introduced with '@', and optional datatype URIRef, introduced by '^^'.
"""

> 
> ...
> 
> -RDF has typed literals. Such literals are written using "^^". Integers 
> can be directly written and are interpreted as typed literals of 
> datatype xsd:integer.-
> 
> +RDF has typed literals. These are written by concatenating the lexical 
> form of the literal value (in double quotes) with the URI of the 
> datatype, separated by  "^^". As a convenience, integers can be directly 
> written (i.e. unquoted with no datatype URI) and are interpreted as 
> typed literals of datatype xsd:integer.+
> 
> 
>       2.2 Triple Patterns
> 
> The building blocks of queries are triple patterns. Syntactically, a 
> SPARQL triple pattern is a subject, predicate and object delimited by 
> parentheses. The -previous- example +in section 2.1+ shows a triple 
> pattern with a -variable subject- +subject variable+ (the variable 
> book), a predicate of dcore:title and -a variable object-+an object 
> variable- (the variable title).

Done.

> 
> ( ?book dcore:title ?title )
> 
> A triple pattern applied to a graph matches all triples with identical 
> RDF terms for the corresponding subject, predicate and object. The 
> variables in the triple pattern, if any, are bound to the corresponding 
> RDF terms in the matching triples.
> 
> (KW Comment: I think you need to elaborate on this definition of 
> "matching". It should be precise. By identical, I assume you mean the 
> lexical forms match, i.e., identical character strings. You need to add 
> the caveat that prefixes are expanded prior to matching and that 
> directly-written integers are converted to typed integers.)

"""
Matching a triple pattern to a graph, gives bindings between variables and RDF 
Terms so that the the triple pattern, with the variables replaced by the 
corresponding RDF terms, is a triple of the graph being matched.
"""

> 
> -In SPARQL, a triple pattern is an RDF triple but with the addition that 
> components can be a query variable instead.-
> 
> +In SPARQL, a triple pattern is an RDF triple in which any component can 
> be a query variable.+

A triple pattern is not an RDF triple if it has different contents.  Text left 
as is.

> 
> ...
> 
> (KW Comment: the example below is confusing and does not illustrate the 
> definition of binding. The table, in fact, shows two variables rather 
> than one variable and the semantics are not defined. I would suggest 
> replacing the table.)
> 
> In this document, we illustrate bindings in results in tabular form,:

"""
In this document, we illustrate bindings in results in tabular form so if 
variable x is bound to "Alice" and variable y is bound to "Bob", we show this as:
"""

> 
> x 	y
> "Alice" 	"Bob"
> 
> (KW Comment: the example above is confusing and does not illustrate the 
> definition of binding (which only mentions a single variable). The table 
> above, in fact, shows two variables rather than one variable and the 
> semantics of binding multiple variables is not defined. I would suggest 
> replacing the table of two columns with a single column that shows two 
> bindings for one variable, as shown below.)
> 
> x
> "Alice"
> 
> "Bob"
> 
> (KW Comment: the paragraph below on optional matches doesn't need to go 
> here. It's confusing. I suggest removing it or moving it to section 4.)

Done.

> 
> -Not every binding needs to exist in every row of the table.  So far, 
> the exampl es have shown queries that either exactly match the graph, or 
> do not match at all. Optional Matches <#optionals> can cause bindings, 
> bit if they fail to match, they do not cause the solution to be 
> rejected, and so can leave variables unset in a row of the table.-
> 
> (KW Comment: the definition of Substitution, below, is confusing, coming 
> immediately after the definition of binding. They seem similar. Could 
> you relate the two? How is a binding related to a substitution? You need 
> to motivate the definition of substitution.)

A subsitution is the function induced by one or more bindings.  Binding is one 
variable, one RDF Term.

Defintion of binding has been reduced to just a definition of the terminology.

"""
Definition: Binding

A binding is a pair which defines a mapping from a variable to an RDF Term.
"""

> 
> *Definition:* Substitution
> 
> A substitution S is a partial functional relation from variables to RDF 
> terms or variables. We write S[v] for the RDF term that S pairs with the 
> variable v and define S[v] to be v where there is no such pairing.
> Â 
> 
> *Definition:* Triple Pattern Matching
> 
> For +substitution+ S and Triple Pattern T, S(T) is -the-+a+ triple 
> pattern +formed+ by replacing any variable v in T with S[v]. (KW 
> Comment: there may be more than one such triple pattern, correct?)

No - a substitution is a function and is well-defined.  Applied to a triple 
pattern there is only one triple pattern produced.

Other typo fixed.

> 
> Triple Pattern T matches RDF graph G with substitution S, if S(T) is a 
> triple of G.
> 
> (KW Comment: the above definition (of Triple Pattern T match G) is a 
> second definition of triple pattern matching. Previously, at the start 
> of section 2.2, you say that a pattern matches all triples with 
> "identical" RDF terms. Is it obvious that these two definitions are 
> identical? Maybe prefix the first definition by saying it is an informal 
> definition.)

I hope the use of the boxes does that informal/formal.  Will consider - theer is 
also a comment outstanmding from Yoshio about putting all definitions before the 
nararrative text.  Probably not possible at the very start of the doc.

> 
> If the same variable name is used more than once in a pattern then, 
> within each -solution-+match+ to the query, the variable has the same value.
> 
> (KW Comment: "solution" is undefined in the above sentence; perhaps 
> "match" is a better word choice.)
> 
> (KW Comment: I found the above the definition of Triple Pattern Matching 
> and Substitution confusing. 'S' is used in different ways; as a partial 
> function of one variable and a mapping from a triple pattern to another 
> triple. Substitution, below, is confusing, coming immediately after the 
> definition of binding. They seem similar. Could you relate the two? How 
> is a binding related to a substitution? You need to motivate the 
> definition of substitution.)
> 
> For example, the query:
> 
> SELECT * WHERE ( ?x ?x ?v )
> 
> matches the triple:
> 
> rdf:type rdf:type rdf:Property .
> 
> with solution:
> 
> x 	v
> rdf:type 	rdf:Property
> 
> It does not match the triple:
> 
> rdfs:seeAlso rdf:type rdf:Property .
> 
> because the variable x would need to be both rdfs:seeAlso and rdf:type 
> in the same solution.
> 
> (KW Comment: again, the word "solution" is used here. It has not been 
> defined. It should be replaced by "match" or some other word or else 
> "solution" should be defined, if only informally).

Avoid "solution" at this point.

> 
> 
>       2.3 Graph Patterns
> 
> -The keyword WHERE is followed by a /Graph Pattern/ which is made of one 
> or more /Triple Patterns/. These Triple Patterns are "and"ed together. 
> More formally, the Graph Pattern is the conjunction of the Triple 
> Patterns. In each query solution, all the triple patterns must be 
> satisfied with the same binding of variables to values.-
> 
> +A /Graph Pattern/ is one or more /Triple Patterns /"and"ed together, 
> i.e., a conjunction of Triple Patterns. In a match, all the triple 
> patterns must be satisfied with the same binding of variables to values.+

Trying to, informally, explain the syntax at this point, hence the keyword 
WHERE.  Avoid solution though.


(Aside: The WHERE isn't necessary for parsing !)

> 
> Data:
> 
> @prefix foaf:    <http://xmlns.com/foaf/0.1/> .
> 
> _:a  foaf:name   "Johnny Lee Outlaw" .
> _:a  foaf:mbox   <mailto:jlow@example.com> .
> 
> -There is a bNode [12] <#ref12> in this dataset. Just within the file, 
> for encoding purposes, the bNode is identified by _:a but the 
> information about the bNode label is not in the RDF graph. No query will 
> be able to identify that bNode by the label used in the serialization.-
> 
> +Note that there is a bNode [12] <#ref12> in this dataset, identified by 
> _:a. This label is used only for encoding within a file; once the file 
> is read into an RDF graph, a new label may be assigned. Consequently, 
> applications should not assume that bnode labels used in a serialization 
> (e.g., a file) can be used to query an RDF graph containing that 
> serialized dataset.+

"""
There is a bNode [12] in this dataset, identified by _:a. The label is only used 
with the file for encoding purposes. The label information is not in the RDF 
graph. No query will be able to identify that bNode by the label used in the 
serialization.
"""
> 
> Query:
> 
> PREFIX foaf:   <http://xmlns.com/foaf/0.1/> 
> SELECT ?mbox
> WHERE
>   ( ?x foaf:name "Johnny Lee Outlaw" )
>   ( ?x foaf:mbox ?mbox )
> 
> Query Result:
> 
> mbox
> <mailto:jlow@example.com>
> 
> -This query contains a conjunctive graph pattern. A conjunctive graph 
> pattern is a set of triple patterns, each of which must match for the 
> graph pattern to match.-
> 
> +The above query contains a conjunctive graph pattern of two triple 
> patterns, each of which must match for the graph pattern to match.+

Done

> 
> *Definition:* Graph Pattern (Partial Definition) â?? Conjunction
> 
> A set of triple patterns is a graph pattern GP. For such a graph pattern 
> to match with substitution +S(T)+, each triple pattern in GP must match 
> with substitution +S(T)+.

Left as is. There is no "T" in this definition.

> 
> *Definition:* Graph Pattern Matching
> 
> For substitution S, we write S(GP) for the graph pattern produced by 
> applying S to each triple pattern T in GP.
> 
> If GP = { T | T triple pattern } then S(GP) = { S(T) }
> 
> Graph Pattern GP matches RDF graph G with substitution S if G simply 
> entails <http://www.w3.org/TR/rdf-mt/#defentail> S(GP).
> 
> (KW Comment: here's yet another definition of matching. Does this 
> supercede the definition of matching for triple patterns?

The defintion of "matching" is being built up through the document.  Each 
definition has a qualifier - here the defintion is "Graph Pattern Matching".

The "simple entails" is another matter.

> At any rate, 
> having a hyperlink to the definition is not good since this document is 
> likely to be printed. So, as a convenience, it would be very, very nice 
> to provide here an informal definition of "simply entails" (i.e., the 
> gotchas; which, I think mainly have to do with bnodes but there may be 
> other ones, too) and then refer the reader to the other document. Also, 
> rather than a hyperlink, this should be a reference, i.e., to some 
> document in Appendix B)..

And what's more, on reflection, I don't think "simply entails" is necessary. 
Subgraph would be clearer and *at this point* the definitions don't rely on 
entailment.  The binding really does have the bNode as its value.  It's later, 
on encoding results, that this is broken.  It must be the same bNode to match 
again later.

> 
> 
>       2.4 Multiple Matches
> 
> The results of a query are all the ways a query can match the graph 
> being queried. Each result is one solution to the query and there may be 
> zero, one or multiple results to a query, depending on the data.
> 
> Data:
> 
> @prefix foaf:  <http://xmlns.com/foaf/0.1/> .
> 
> _:a  foaf:name   "Johnny Lee Outlaw" .
> _:a  foaf:mbox   <mailto:jlow@example.com> .
> _:b  foaf:name   "Peter Goodguy" .
> _:b  foaf:mbox   <mailto:peter@example.org> .
> 
> Query:
> 
> PREFIX foaf:   <http://xmlns.com/foaf/0.1/> 
> SELECT ?name, ?mbox
> WHERE
>   ( ?x foaf:name ?name )
>   ( ?x foaf:mbox ?mbox )
> 
> Query Result:
> 
> name 	mbox
> "Johnny Lee Outlaw" 	<mailto:jlow@example.com>
> "Peter Goodguy" 	<mailto:peter@example.org>
> 
> The results enumerate the RDF terms to which the /selected/ variables 
> can be bound in the graph pattern. In the above example, the following 
> two subsets of the data caused the two matches.
> 
>  _:a foaf:name  "Johnny Lee Outlaw" .
>  _:a foaf:box   <mailto:jlow@example.com> .
> 
>  _:b foaf:name  "Peter Goodguy" .
>  _:b foaf:box   <mailto:peter@example.org> .
> 
> (KW Comment: I don't like the above example because it illustrates two 
> concepts. First, it shows that a query may have multiple solutions. 
> That's fine. But, it also illustrates the results can be a projection of 
> the query variables. This raises additional questions, specifically, how 
> are duplicates handled. I'd feel better if this example included 
> variable 'x' in the result list.).

Point taken.  However, (1) this isn't the first time projection has happened and 
(2) avoiding bNodes in results is desirable for clarity.  The only option would 
be to not use FOAF but then we wouldn't have that is familiar to at least some 
people.  Having synthetic data is rather dry.  In this example, there aren't 
duplicates.

> 
> For a simple, conjunctive graph pattern match, all the variables used in 
> the -query-+graph+ pattern will be bound in every solution.
> 
> *Definition:* Pattern Solution
> 
> A Pattern Solution of Graph Pattern GP on graph G is any substitution S 
> such that GP matches G with S.
> 
> For a graph pattern GP formed as a set of triple patterns, 
> -S(G)-+S(GP)+, has no variables and is a subgraph of G.

Done.

> 
> *Definition:* Query Solution
> 
> A Query Solution is a Pattern Solution where the pattern is the whole 
> pattern of the query.
> 
> *Definition:* Query Results
> 
> The Query Results, for a given graph pattern GP on G, is written 
> R(GP,G), and is the set of all query solutions such that GP matches G.
> 
> R(GP, G) may be the empty set.
> 
> 
>       2.5 Blank Nodes
> 
> 
>         Blank Nodes and Queries
> 
> There is no standard representation of bNodes 
> <http://www.w3.org/TR/rdf-concepts/#section-blank-nodes> in RDF and the 
> syntax of SPARQL queries does not allow them. They can form part of a 
> pattern match and do take part in in the pattern matching process.
> 
> Suggestions for better wording most welcome.
> 
> (KW Comment: I'm not sure how to reword it because I'm not sure what 
> it's trying to say. Here are two possible interpretations.  /1. The 
> representation of bNodes (i.e., the bNode label) is not specified in RDF 
> and is implementation-specific. Consequently, SPARQL syntax does not 
> prescribe a representation. Applications may use bNode labels in a 
> pattern match but such queries are implementation-specific and are 
> should not be considered portable.  2. The representation of bNodes 
> (i.e., the bNode label) is not specified in RDF. Consequently, bNode 
> labels should not be used in a triple pattern because that 
> representation is not portable across implementations and, if fact, the 
> label may even change within one implementation (labels need not be 
> persistent).  However, variables in a triple pattern may be bound to 
> bNodes. /Regardless of the interpretation, the definition of Triple 
> Pattern in section 2.2 should perhaps be modified to reflect the notion 
> that bNodes should not be in a triple pattern (or should they?).

As set up, bNodes can be in a triple pattern; it can happen when some other part 
of the query pattern makes the binding of variable to bNode.

The text is saying that its meaningless to include a bNode (it would never match 
query variables aren't bNodes).

It's not just that bNode labels are not portable - it's not part of RDF to have 
bNodes outside of a graph.

"""
BNodes can't appear in a SPARQL query. There is no standard representation of 
bNodes in RDF and the syntax of SPARQL queries does not allow them.

They do take part in the pattern matching process.
"""

> 
> 
>         Blank Nodes and Results
> 
> In the results of queries, the presence of bNodes can be indicated but 
> the internal system -identification-+label+ -is not-+may not be+ 
> preserved.

I prefer the stronger "is not preserved". If the same label is used in the 
result syntax, then it's luck (Jena definitely won't - as Kevin knows, our bNode 
internal labels are not friendly!).

 > Thus, a client can tell that two solutions to a query differ
> in bNodes needed to perform the graph match but this information is only 
> scoped to the results (result set or RDF graph). +Repeating the query 
> (on the identical graph) may produce different labels for the bNodes.+
> 
> Redo when XML syntax document is available an duse the syntax there.
> 
> Data:
> 
> @prefix foaf:  <http://xmlns.com/foaf/0.1/> .
> 
> _:a  foaf:name   "Alice" .
> _:b  foaf:name   "Bob" .
> 
> Query:
> 
> PREFIX foaf:   <http://xmlns.com/foaf/0.1/> 
> SELECT ?x ?name
> WHERE  ( ?x foaf:name ?name )
> 
> Query Result +(query run twice)+:
> 
> x 	name
> _:a 	"Alice"
> _:b 	"Bob"
> 
> x 	name
> _:r 	"Alice"
> _:s 	"Bob"
> 
> These two results have the same information: the blank node used to 
> match the query was different in the two solutions.  There is no 
> relation between using _:a in the results and any internal blank node 
> label in the data graph; the labels in the results only indicate whether 
> -elements-+terms+ in the +solutions+ were the same or different.
> 
> 

Committed version 1.141

Received on Wednesday, 1 December 2004 16:23:43 UTC