comments on "SPARQL Query Language for RDF" from Peter F. Patel-Schneider on 2005-09-08 (public-rdf-dawg-comments@w3.org from September 2005)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Wed, 07 Sep 2005 21:26:59 -0400 (EDT)
To: public-rdf-dawg-comments@w3.org
Message-Id: <20050907.212659.48908953.pfps@research.bell-labs.com>
[I have included various bits of SPARQL and RDF documents to provide sources
for my arguments.  I do not believe that this list of issues is comprehensive -
there are too many major issues herein for me to believe that I have found all
the issues in the document.  Nor do I believe that changes made to address
these issues will not give rise to new issues.]


In general, I am disappointed with the lack of rigour in the document.  I do
not see how interoperating implementations can be derived from this document.



ISSUE 1 (show-stopper):  Non-respect for RDF Semantics

I realize that this interacts with the charter of the group.  What this means
to me is that the charter is ill-formed with respect to RDF.

Section 1:

	An RDF graph is a set of triples; each triple consists of a subject, a
	predicate and an object. This is defined in RDF Concepts and Abstract
	Syntax.

>From RDF Concepts and Abstract Syntax:

	6.2 RDF Graph

	An RDF graph is a set of RDF triples.

	The set of nodes of an RDF graph is the set of subjects and objects of
	triples in the graph.

	6.3 Graph Equivalence

	Two RDF graphs G and G' are equivalent if there is a bijection M
	between the sets of nodes of the two graphs, such that:

	   1. M maps blank nodes to blank nodes.
	   2. M(lit)=lit for all RDF literals lit which are nodes of G.
	   3. M(uri)=uri for all RDF URI references uri which are nodes of G. 
	   4. The triple ( s, p, o ) is in G if and only if the triple ( M(s),
	      p, M(o) ) is in G'

	With this definition, M shows how each blank node in G can be replaced
	with a new blank node to give G'.


This strongly indicates that SPARQL does not respect the semantics of RDF.
For example, querying

G1	ex:a ex:b ex:c .
	_:a ex:b ex:c .

with the triple pattern

	?x ex:b ex:c .

will result in two matches, one for ex:a and one for _:a.
Querying 

G2	ex:a ex:b ex:c .

with the same triple pattern will result in only one match, for ex:a.  However,
G1 and G2 are equivalent with respect to simple interpretations (and thus also
for RDF interpretations, RDFS interpretations, and OWL interpretations).

Many similar examples can be concocted, including 

G3	_:a ex:b ex:c .
	_:b ex:b ex:c .

versus

G4	_:a ex:b ex:c .	

When RDF interpretations are considered, other nonfidelities arise, some having
to do with too few results.



ISSUE 2 (major):  Inconsistent definitions with respect to blank nodes

Section 2.2:

	Queries can include blank nodes; the blank nodes in a query are
	disjoint from all blank nodes in the RDF graphs being matched and
	members of the set of variables.

Section 2.3:

	Matching a triple pattern to a graph gives bindings between variables
	and RDF Terms so that the triple pattern, with the variables replaced
	by the corresponding RDF terms, is a triple of the graph being matched.

	Definition: Triple Pattern

	A triple pattern is member of the set:
	    (RDF-T union V) x (I union V) x (RDF-T union V)

So a triple pattern containing a blank node cannot match any RDF graph.
However, this is contradicted later:

Section 2.7:

	A blank node can appear in a query pattern. It behaves as a variable; a
	blank node in a query pattern may match any RDF term.

Here blank nodes have the same status as variables.

Which is correct?  

Blank nodes are either useless or superfluous.



ISSUE 3 (minor): Lack of definition for "subgraph"

Section 2.5:

	Definition: Basic Graph Pattern

	A Basic Graph Pattern is a set of Triple Patterns.

	A basic graph pattern matches on graph G with solution S if S(GP) is an
	RDF graph and is subgraph of G.

Subgraph is undefined in the document.  I am assuming the defintion from RDF
Semantics as this is the only definition of subgraph in the RDF specification.
(The term "subgraph" does not appear in RDF Concepts.)

RDF Semantics Section 0.3:

	A subgraph of an RDF graph is a subset of the triples in the graph. A
	triple is identified with the singleton set containing it, so that each
	triple in a graph is considered to be a subgraph. A proper subgraph is
	a proper subset of the triples in the graph.


ISSUE 4 (major): Inconsistency in the behaviour of blank nodes in results

Section 2.5:

	Definition: Basic Graph Pattern

	A Basic Graph Pattern is a set of Triple Patterns.

	A basic graph pattern matches on graph G with solution S if S(GP) is an
	RDF graph and is subgraph of G.

Therefore blank nodes in results will be the blank nodes from the graph, and
will persist over a sequence of queries to the same graph.

Section 2.7:

	The presence of blank nodes can be indicated by labels in the
	serialization of query results. An application or client receiving the
	results of a query can tell that two solutions or two variable bindings
	differ in blank nodes but this information is only scoped to the
	results as defined in "SPARQL Variable Binding Results XML Format" or
	the CONSTRUCT result form.

	Data:

	@prefix foaf:  <http://xmlns.com/foaf/0.1/> .
	_:a  foaf:name   "Alice" .
	_:b  foaf:name   "Bob" .

	Query:

	PREFIX foaf:   <http://xmlns.com/foaf/0.1/> 
	SELECT ?x ?name
	WHERE  { ?x foaf:name ?name }

	[Result:]	
	x      name
	_:c    "Alice"
	_:d    "Bob"

This contradicts the definition from Section 2.5.

Section 2.7:

	The results above could equally be given with different blank node labels
	because the labels in the results only indicate whether RDF terms in the
	solutions were the same or different.

	[Result:]	
	x	  name
	_:r	  "Alice"
	_:s	  "Bob"

	These two results have the same information: the blank nodes used to
	match the query are different in the two solutions. There is no
	relation between using _:a in the results and any blank node label in
	the data graph.

This contradicts the definition from Section 2.5


ISSUE 5 (minor):  Poor definition of syntax of constraints

Section 3.3:

	Definition: Value Constraint

	A value constraint is a boolean-valued expression of variables and RDF Terms.

This doesn't provide any syntax for value constraints, nor does Section 11.2.  
This is somewhat alleviated by the syntax appendix, but not completely.


ISSUE 6 (major): Optional graph patterns

Section 5:

	Optional matching provides this facility; if the optional part does not
	lead to any solutions, variables can be left unbound.

Section 5.1:

	This query finds the names of people in the data. If there is a triple
	with predicate mbox and same subject, a solution will contain the
	object of that triple as well. In the example, only a single triple
	pattern is given in the optional match part of the query but, in
	general, it is any graph pattern. The whole graph pattern of an
	optional graph pattern must match for the optional graph pattern to add
	to the query solution.

What happens when querying

G5	ex:x ex:a ex:b .
	ex:a ex:c ex:f .
	ex:a ex:c ex:g .

with

	SELECT ( ?v ?w ) WHERE
	{ ?v ex:a ex:b .
	  OPTIONAL { ?v ex:c ?w .  } } 

There is a reading of the document that supports the possibility of returning
only one match.  The intent is probably that all possible matches must be
returned, but the document does not so state.

Section 5.4:

	Definition:  Optional Graph Pattern

	An optional graph pattern is a combination of a pair of graph
	patterns. The second pattern modifies the solution of the first pattern
	but does not fail matching of the overall optional graph pattern.

	If Opt(A, B) is an optional graph pattern, where A and B are graph
	patterns, then S is a solution of [the] optional graph pattern if S is
	a solution of A and of B otherwise if S is a solution to A, but
	not to A and B.

First, this doesn't parse.

Second, this doesn't cover a query where a graph pattern has two optional
parts.

Third, obvious fixes to the definition to make it parse are not compatible with
the informal first part of the section.  What is probably meant is something
like:

	... then if the graph pattern formed by grouping A and B has solutions
	then S is a solution to Opt(A,B) iff S is a solution to the grouping of
	A and B.  Otherwise (i.e., if the grouping of A and B has no solutions)
	then S is a solution to Opt(A,B) iff S is a solution to A.

This still doesn't cover multiple (non-nested) optional parts.  An extension
could look something like

	S is a solution to Opt(A) iff S is a solution to A.

	S is a solution to Opt(A, B1,...,Bn) iff
	1/ if the grouping of A and all	the Bi has solutions and S is a
	   solution to the grouping of A and all the Bi; or
        2/ if the grouping of A and all the Bi has no solutions and S is a
	   solution to Opt(A,B1,...,Bi-1,Bi+1,...,Bn) for some 1<=i<=n.

 


Peter F. Patel-Schneider
Bell Labs Research
Received on Thursday, 8 September 2005 01:27:18 UTC