- From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
- Date: Tue, 06 Jun 2006 03:00:13 -0400 (EDT)
- To: public-rdf-dawg-comments@w3.org
I am still awaiting a substantive response to my last-call response of 22 February 2006 on SPARQL Query Language for RDF, W3C Working Draft 20 February 2006, http://www.w3.org/TR/2006/WD-rdf-sparql-query-20060220/. I believe that the comments in this message raise substantive technical issues that need to be addressed before SPARQL becomes a W3C recommendation. Peter F. Patel-Schneider From: Dan Connolly <connolly@w3.org> Subject: Re: comments on Section 1 and Section 2 of SPARQL Query Language for RDF [OK?] [needstest] Date: Wed, 22 Mar 2006 11:46:42 -0600 > On Wed, 2006-02-22 at 18:56 -0500, Peter F. Patel-Schneider wrote: > > > > > > Comments on Section 1 and Section 2 of > > > > SPARQL Query Language for RDF > > W3C Working Draft 20 February 2006 > > http://www.w3.org/TR/2006/WD-rdf-sparql-query-20060220/ > > > > > > These are personal comments, from me, an interested expert. They may not > > reflect the views of any institution to which I am associated. > > Thank you very much for your detailed review... > > > > In general I found the first two sections of the document *very* hard to > > understand. The mixing of definitions, explanation, information, etc. confused > > me over and over again. I strongly suggest an organization something like: > > > > Introduction (informative) > > Formal development (normative) > > Underlying notions (normative) > > Patterns and matching (normative) > > SPARQL syntax (normative) > > Informal narrative (informative) > > Examples (informative) > > > > I also found that things that didn't need to be explained were explained, and > > things that did need to be explained were not explained. A major example of > > the latter is the role of the scoping graph. Examples showing why E-matching > > is defined the way it is would be particularly useful. > > > > > > Because of the problems I see in Section 2, I do not feel that I can adequately > > understand the remainder of the document. > > > > Because of these problems I do not feel that this document should be advanced > > to the next stage in the W3C recommendation process without going through > > another last-call stage. (This could however be performed by terminating the > > current last call, quickly fixing the document, and starting another last > > call.) > > After perhaps overly brief consideration of your comments, we are > somewhat sympathetic to your concerns about organization and > clarity; however, we also have schedule considerations > and the investment in other reviewers. Re-organizing the document > at this stage would delay things considerably; it's not even clear > that we could get a sufficient number of reviewers to take another > look before CR. > > The specific examples you give below are very valuable; I > am marking this thread [needstest], which allows us to find > it more easily during CR and integrate the examples you give > into our test suite. We have also discussed the possibility > of significant organizational changes after CR, such as > moving the formal definitions to the back of the document. > > As far as I can tell, all of the examples you give are useful > clarification questions, but they do not demonstrate design errors. > If they do, in fact, demonstrate design errors, I'm reasonably > confident we will discover that as we integrate them into > our test suite during CR. > > Are you, by chance, satisfied by this response, which does > not involve making the changes you request at this time, > but includes an offer to give them due consideration after > we request CR? If not, there's no need to reply; I'm marking > this comment down as outstanding dissent unless I hear otherwise. > > > > Specific comments follow: > > > > Section 1. > > > > An RDF graph is a set of triples; each triple consists of a > > <em>subject</em>, a <em>predicate</em> and an <em>object</em>. This is > > defined in RDF Concepts and Abstract Syntax. > > > > C1.1: An unqualified "this" cannot be used at the beginning of the second sentence. > > > > The RDF graph may be virtual, in that it is not fully materialized, > > > > C1.2: Defining virtual in terms of another term that is not itself defined is not > > very useful. > > > > only doing the work needed for each query to execute. > > > > C1.3: Who is doing what work here? > > > > SPARQL is a query language for getting information from such RDF > > graphs. > > > > C1.4: Surely a more formal tone is called for here. > > > > It provides facilities to: > > - extract information in the form of URIs, blank nodes, plain and typed > > literals. > > - extract RDF subgraphs. > > - construct new RDF graphs based on information in the queried graphs. > > > > C1.5: I don't recognize the intent of SPARQL in any of these options. > > > > As a data access language, it is suitable for both local and remote > > use. > > > > C1.6: The "it" is rather too far from its referent. > > > > The companion SPARQL Protocol for RDF document describes the remote > > access protocol. > > > > C1.7: What about the "local" access protocol? Is there one? If so, where is it? If > > not, why is there not one? > > > > <!-- Commented Document Outline --> > > > > C1.8: There appears to be significant commented-out portions of the document. Do > > such parts of the document have any import? If so, then they probably should > > not be commented-out. If not, then the commented-out portions should be > > removed. > > > > > > Section 2. > > > > C2.15: In general, Section 2 switches modes much too much. Which parts of > > Section 2 are tutorial? Which are definitional? Which are explanatory? > > > > The SPARQL query language is based on matching graph patterns. > > > > C2.1: What is a "matching graph pattern"? I do not believe that it is defined > > in the remainder of the document. (Yes, yes, I know that the problem is > > actually that the sentence itself is poorly constructed.) > > > > The simplest graph pattern is the triple pattern, which is like an RDF > > triple, but with the possibility of a variable instead of an RDF term > > in the subject, predicate or object positions. > > > > C2.4: This should probably be stated more precisely, using, at least "and/or". > > > > Combining triple gives a basic graph pattern, where an exact match to a > > graph is needed to fulfill a pattern. > > > > C2.2: Probably "triple" should be "triples". > > > > C2.3: I do not believe that this matches the intent of SPARQL queries. > > > > The example below shows a SPARQL query to find the title of a book from > > the information in the given RDF graph. > > > > C2.5: The use of "the given" here is not helpful. I feel that it would be better > > to use an indefinite article instead. > > > > > > The terms delimited by "<>" are IRI references [...]. They stand for > > IRIs, either directly, or relative to a base IRI. > > > > C2.6: What is a term? Which terms? What does "stand for" mean here? What > > role does the base IRI play in this "stand for" relationship? > > > > C2.7: The rules for IRIs are not adequately specified in Section 2.1.1. Are > > the two abbreviated mechanisms enclosed in "<>"? Can a prefix expand to a > > relative IRI? > > > > optional datatype IRI or prefixed name (introduced by ^^) > > > > C2.8: Can this be a relative IRI? Is it expanded using the rules of > > Section 2.1.1? > > > > Variables in SPARQL queries have global scope; it is the same variable > > everywhere in the query that the same name is used > > > > C2.9: Wrong number agreement. > > > > Blank nodes are indicated by either the form _:a or use of [ ]. > > > > C2.10: Is _:a the *only* blank node allowed? If not, which parts of these bits > > of syntax can vary, and how? > > > > Triple Patterns are written as a list of subject, predicate, object; > > > > C2.11: The examples of triple patterns don't seem to be written this way. > > > > The following examples express the same query: > > [several examples] > > Prefixes are syntactic: the prefix name does not affect the query, nor > > do prefix names in queries need to be the same prefixes as used in a > > serialization of the data. The following query is equivalent to the > > previous examples and will give the same results when applied to the > > same data: > > [one example] > > > > C2.12: The first group of examples appears to exhibit more internal variability > > than the single example adds. Why, then, is the single example broken out? Is > > there something that I am missing here? > > > > > > The data format used in this document is > > > > C2.13: What is the "data"? > > > > C2.16: Section 2.1 claims to be about "Writing a Simple Query", but doesn't > > seem to provide any guidance on this topic. > > > > 2.2 Initial Definitions > > > > C2.14: There appears to have been quite a number of definitions already? How, > > then, can this be an "initial" set of definitions? > > > > A query variable is a member of the set V where V is infinite and > > disjoint. > > > > C2.20: What is V? Perhaps you mean V to be some arbitrary, but fixed set. > > > > Definition: Graph Pattern > > A Graph Pattern is one of: > > Basic Graph Pattern > > Group Graph Pattern > > Value Constraints > > Optional Graph Pattern > > Union Graph Pattern > > RDF Dataset Graph Pattern > > > > C2.15: Are these all part of simple queries? If not, what is this doing in > > Section 2? Ditto for the definition for SPARQL Query. > > > > Definition: SPARQL Query > > A SPARQL query is a tuple (GP, DS, SM, R) where: > > > > C2.16: What, then, are the things in Section 2.1 that contain the SELECT > > keyword? > > > > The following triple pattern has a subject variable (the variable > > book), a predicate dc:title and an object variable (the variable > > <title). > > > > ?book dc:title ?title . > > > > C2.17: dc:title does not appear to be valid as any second element of a triple > > pattern. > > > > Definition: Triple Pattern > > A triple pattern is member of the set: > > (RDF-T union V) x (I union V) x (RDF-T union V) > > > > C2.18: How is the syntax above (?book dc:title ?title .) mapped into this set? > > > > This definition of Triple Pattern includes literal subjects. > > [...] > > This definition also allows blank nodes in the predicate position. > > > > C2.19: The referent is too far away for this construction. > > > > Definition: Pattern Solution > > A variable solution is a substitution function from a subset of V, the > > set of variables, to the set of RDF terms, RDF-T. > > A pattern solution, S, is a variable substitution whose domain includes > > all the variables in V and whose range is a subset of the set of RDF > > terms. > > The result of replacing every member v of V in a graph pattern P by > > S(v) is written S(P). > > If v is not in the domain of S then S(v) is defined to be v. > > > > C2.21: I thought that V was the set of variables. Why then write "all the > > variables in V"? > > > > C2.22: Given that the domain of S is all the variables in V, i.e., all the > > variables, then what use is the last sentence of the above definition? > > > > has a single triple pattern as the query pattern > > > > C2.23: What is the "query pattern" of a query? Perhaps you mean the graph > > pattern of the query? > > > > An E-entailment regime is a binary relation between subsets of RDF > > graphs. > > > > C2.24: Perhaps you mean "between sets of RDF graphs"? > > > > Definition: Scoping Graph > > The Scoping Graph G' for RDF graph G, is an RDF Graph that is > > graph-equivalent to G > > > > C2.25: FATAL: There can be many RDF graphs that are graph-equivalent to a > > particular RDF graph. Therefore the Scoping Graph is not adequately defined. > > > > The scoping graph makes the graph to be matched independent of the > > chosen blank node names. > > > > C2.25a: Which chosen blank node names? Why should this matter at all? Aren't > > the blank node names simply a notational convenience? > > > > C2.25b: This needs to be proven. > > > > Definition: Basic Graph Pattern E-matching > > Given an entailment regime E, a basic graph pattern BGP, and RDF graph > > G, with scoping graph G', then BGP E-matches with pattern solution S on > > graph G with respect to scoping set B if: > > - BGP' is a basic graph pattern that is graph-equivalent to BGP > > - G' and BGP' do not share any blank node labels. > > - (G' union S(BGP')) is a well-formed RDF graph for E-entailment > > - G E-entails (G' union S(BGP')) > > - The RDF terms introduced by S all occur in B. > > > > C2.26: Some of the elements of the point list are missing punctuation. > > > > C2.27: FATAL: The status of B is not adequately provided. Is B a parameter of > > E-matching or is it somehow determined by the other parameters? > > > > These definitions allow for future extensions to SPARQL. > > > > C2.28: Which definitions? > > > > This document defines SPARQL for simple entailment and the scoping set > > B is the set of all RDF terms in G'. > > > > C2.29: SPARQL for simple entailment? Probably you mean something like "This > > document only defines the simple entailment version of SPARQL". > > > > C2.30: The second half of this sentence does not make any sense. Perhaps you > > mean something like "The simple entailment version of SPARQL (hereafter > > SPARQL) is based on BGP E-matching where the entailment regime (E) is always > > simple entailment and the scoping set (B) is always the set of RDF terms in > > G'. > > > > C2.31: FATAL: This still leaves SPARQL matching with the following parameters: > > 1/ the graph pattern BGP > > 2/ the RDF graph G > > 3/ the scoping graph G' (which is not adequately defined) > > The problem with G' needs to be addressed. > > > > A pattern solution can then be defined as follows: to match a basic > > graph pattern under simple entailment, it is possible to proceed by > > finding a mapping from blank nodes and variables in the basic graph > > pattern to terms in the graph being matched; a pattern solution is then > > a mapping restricted to just the variables, possibly with blank nodes > > renamed. Moreover, a uniqueness property guarantees the > > interoperability between SPARQL systems: given a graph and a basic > > graph pattern, the set of all the pattern solutions is unique up to > > blank node renaming. > > > > C2.32: Where is G' in this operation? > > > > C2.33: It seems to me that SPARQL simple matching is entirely deterministic. > > Given BGP, G, and G', the set of pattern solutions that make BGP match G with > > scope G' is fixed. I then don't understand the "unique up to blank node > > renaming" above. > > > > C2.34: If I am missing something here, and there indeed is something to be > > shown, then it has to be proven. > > > > There is a blank node [..] in this dataset, identified by_:a. > > > > C2.34: What is "dataset"? > > > > C2.35: Are there not two blank nodes in this dataset? > > > > In the SPARQL syntax, Basic Graph Patterns are sequences of triple > > patterns mixed with value constraints. > > > > C2.36: Why not say something like "value constraints can be mixed in sequences > > of triples patterns. The triple patterns form a BGP."? > > > > The results of a query is > > > > C2.37: Why not "The result"? > > > > C2.39: I believe that it would be very useful to show the four matches > > generated by the basic query pattern in Section 2.6 (as well as the two matches > > for the BGP in Section 2.5.3). > > > > Blank nodes in the results of a query are identical to those occurring > > in the dataset graphs > > > > C2.38: This is very misleading. SPARQL matching does indeed restrict the bnode > > in query results to be bnodes from the RDF graph, but not in a useful way. For > > example, > > ?x ex:a ex:b . > > matches against > > _:a ex:a _:b . > > with two results for ?x, at least as far as I can determine. > > > > C2.39: I believe that there are four matches for the BGP in Section 2.7. Why > > are only two results given? > -- > Dan Connolly, W3C http://www.w3.org/People/Connolly/ > D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
Received on Tuesday, 6 June 2006 07:00:34 UTC