- From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
- Date: Wed, 07 Sep 2005 21:26:59 -0400 (EDT)
- To: public-rdf-dawg-comments@w3.org
[I have included various bits of SPARQL and RDF documents to provide sources for my arguments. I do not believe that this list of issues is comprehensive - there are too many major issues herein for me to believe that I have found all the issues in the document. Nor do I believe that changes made to address these issues will not give rise to new issues.] In general, I am disappointed with the lack of rigour in the document. I do not see how interoperating implementations can be derived from this document. ISSUE 1 (show-stopper): Non-respect for RDF Semantics I realize that this interacts with the charter of the group. What this means to me is that the charter is ill-formed with respect to RDF. Section 1: An RDF graph is a set of triples; each triple consists of a subject, a predicate and an object. This is defined in RDF Concepts and Abstract Syntax. >From RDF Concepts and Abstract Syntax: 6.2 RDF Graph An RDF graph is a set of RDF triples. The set of nodes of an RDF graph is the set of subjects and objects of triples in the graph. 6.3 Graph Equivalence Two RDF graphs G and G' are equivalent if there is a bijection M between the sets of nodes of the two graphs, such that: 1. M maps blank nodes to blank nodes. 2. M(lit)=lit for all RDF literals lit which are nodes of G. 3. M(uri)=uri for all RDF URI references uri which are nodes of G. 4. The triple ( s, p, o ) is in G if and only if the triple ( M(s), p, M(o) ) is in G' With this definition, M shows how each blank node in G can be replaced with a new blank node to give G'. This strongly indicates that SPARQL does not respect the semantics of RDF. For example, querying G1 ex:a ex:b ex:c . _:a ex:b ex:c . with the triple pattern ?x ex:b ex:c . will result in two matches, one for ex:a and one for _:a. Querying G2 ex:a ex:b ex:c . with the same triple pattern will result in only one match, for ex:a. However, G1 and G2 are equivalent with respect to simple interpretations (and thus also for RDF interpretations, RDFS interpretations, and OWL interpretations). Many similar examples can be concocted, including G3 _:a ex:b ex:c . _:b ex:b ex:c . versus G4 _:a ex:b ex:c . When RDF interpretations are considered, other nonfidelities arise, some having to do with too few results. ISSUE 2 (major): Inconsistent definitions with respect to blank nodes Section 2.2: Queries can include blank nodes; the blank nodes in a query are disjoint from all blank nodes in the RDF graphs being matched and members of the set of variables. Section 2.3: Matching a triple pattern to a graph gives bindings between variables and RDF Terms so that the triple pattern, with the variables replaced by the corresponding RDF terms, is a triple of the graph being matched. Definition: Triple Pattern A triple pattern is member of the set: (RDF-T union V) x (I union V) x (RDF-T union V) So a triple pattern containing a blank node cannot match any RDF graph. However, this is contradicted later: Section 2.7: A blank node can appear in a query pattern. It behaves as a variable; a blank node in a query pattern may match any RDF term. Here blank nodes have the same status as variables. Which is correct? Blank nodes are either useless or superfluous. ISSUE 3 (minor): Lack of definition for "subgraph" Section 2.5: Definition: Basic Graph Pattern A Basic Graph Pattern is a set of Triple Patterns. A basic graph pattern matches on graph G with solution S if S(GP) is an RDF graph and is subgraph of G. Subgraph is undefined in the document. I am assuming the defintion from RDF Semantics as this is the only definition of subgraph in the RDF specification. (The term "subgraph" does not appear in RDF Concepts.) RDF Semantics Section 0.3: A subgraph of an RDF graph is a subset of the triples in the graph. A triple is identified with the singleton set containing it, so that each triple in a graph is considered to be a subgraph. A proper subgraph is a proper subset of the triples in the graph. ISSUE 4 (major): Inconsistency in the behaviour of blank nodes in results Section 2.5: Definition: Basic Graph Pattern A Basic Graph Pattern is a set of Triple Patterns. A basic graph pattern matches on graph G with solution S if S(GP) is an RDF graph and is subgraph of G. Therefore blank nodes in results will be the blank nodes from the graph, and will persist over a sequence of queries to the same graph. Section 2.7: The presence of blank nodes can be indicated by labels in the serialization of query results. An application or client receiving the results of a query can tell that two solutions or two variable bindings differ in blank nodes but this information is only scoped to the results as defined in "SPARQL Variable Binding Results XML Format" or the CONSTRUCT result form. Data: @prefix foaf: <http://xmlns.com/foaf/0.1/> . _:a foaf:name "Alice" . _:b foaf:name "Bob" . Query: PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?x ?name WHERE { ?x foaf:name ?name } [Result:] x name _:c "Alice" _:d "Bob" This contradicts the definition from Section 2.5. Section 2.7: The results above could equally be given with different blank node labels because the labels in the results only indicate whether RDF terms in the solutions were the same or different. [Result:] x name _:r "Alice" _:s "Bob" These two results have the same information: the blank nodes used to match the query are different in the two solutions. There is no relation between using _:a in the results and any blank node label in the data graph. This contradicts the definition from Section 2.5 ISSUE 5 (minor): Poor definition of syntax of constraints Section 3.3: Definition: Value Constraint A value constraint is a boolean-valued expression of variables and RDF Terms. This doesn't provide any syntax for value constraints, nor does Section 11.2. This is somewhat alleviated by the syntax appendix, but not completely. ISSUE 6 (major): Optional graph patterns Section 5: Optional matching provides this facility; if the optional part does not lead to any solutions, variables can be left unbound. Section 5.1: This query finds the names of people in the data. If there is a triple with predicate mbox and same subject, a solution will contain the object of that triple as well. In the example, only a single triple pattern is given in the optional match part of the query but, in general, it is any graph pattern. The whole graph pattern of an optional graph pattern must match for the optional graph pattern to add to the query solution. What happens when querying G5 ex:x ex:a ex:b . ex:a ex:c ex:f . ex:a ex:c ex:g . with SELECT ( ?v ?w ) WHERE { ?v ex:a ex:b . OPTIONAL { ?v ex:c ?w . } } There is a reading of the document that supports the possibility of returning only one match. The intent is probably that all possible matches must be returned, but the document does not so state. Section 5.4: Definition: Optional Graph Pattern An optional graph pattern is a combination of a pair of graph patterns. The second pattern modifies the solution of the first pattern but does not fail matching of the overall optional graph pattern. If Opt(A, B) is an optional graph pattern, where A and B are graph patterns, then S is a solution of [the] optional graph pattern if S is a solution of A and of B otherwise if S is a solution to A, but not to A and B. First, this doesn't parse. Second, this doesn't cover a query where a graph pattern has two optional parts. Third, obvious fixes to the definition to make it parse are not compatible with the informal first part of the section. What is probably meant is something like: ... then if the graph pattern formed by grouping A and B has solutions then S is a solution to Opt(A,B) iff S is a solution to the grouping of A and B. Otherwise (i.e., if the grouping of A and B has no solutions) then S is a solution to Opt(A,B) iff S is a solution to A. This still doesn't cover multiple (non-nested) optional parts. An extension could look something like S is a solution to Opt(A) iff S is a solution to A. S is a solution to Opt(A, B1,...,Bn) iff 1/ if the grouping of A and all the Bi has solutions and S is a solution to the grouping of A and all the Bi; or 2/ if the grouping of A and all the Bi has no solutions and S is a solution to Opt(A,B1,...,Bi-1,Bi+1,...,Bn) for some 1<=i<=n. Peter F. Patel-Schneider Bell Labs Research
Received on Thursday, 8 September 2005 01:27:18 UTC