Re: comments on Section 1 and Section 2 of SPARQL Query Language for RDF [OK?] [needstest] from Peter F. Patel-Schneider on 2006-06-06 (public-rdf-dawg-comments@w3.org from June 2006)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Tue, 06 Jun 2006 03:00:13 -0400 (EDT)
To: public-rdf-dawg-comments@w3.org
Message-Id: <20060606.030013.95790077.pfps@research.bell-labs.com>
I am still awaiting a substantive response to my last-call response of 22
February 2006 on SPARQL Query Language for RDF, W3C Working Draft 20
February 2006, http://www.w3.org/TR/2006/WD-rdf-sparql-query-20060220/.

I believe that the comments in this message raise substantive technical
issues that need to be addressed before SPARQL becomes a W3C
recommendation.

Peter F. Patel-Schneider


From: Dan Connolly <connolly@w3.org>
Subject: Re: comments on Section 1 and Section 2 of SPARQL Query Language for RDF [OK?] [needstest]
Date: Wed, 22 Mar 2006 11:46:42 -0600

> On Wed, 2006-02-22 at 18:56 -0500, Peter F. Patel-Schneider wrote:
> > 
> > 
> > Comments on Section 1 and Section 2 of
> > 
> > 	SPARQL Query Language for RDF
> > 	W3C Working Draft 20 February 2006
> > 	http://www.w3.org/TR/2006/WD-rdf-sparql-query-20060220/
> > 
> > 
> > These are personal comments, from me, an interested expert.  They may not
> > reflect the views of any institution to which I am associated.
> 
> Thank you very much for your detailed review...
> 
> 
> > In general I found the first two sections of the document *very* hard to
> > understand.  The mixing of definitions, explanation, information, etc. confused
> > me over and over again.  I strongly suggest an organization something like:
> > 
> >   Introduction (informative)
> >   Formal development (normative)
> >     Underlying notions (normative)
> >     Patterns and matching (normative)
> >   SPARQL syntax (normative)
> >   Informal narrative (informative)
> >   Examples (informative)
> > 
> > I also found that things that didn't need to be explained were explained, and
> > things that did need to be explained were not explained.  A major example of
> > the latter is the role of the scoping graph.  Examples showing why E-matching
> > is defined the way it is would be particularly useful.
> > 
> > 
> > Because of the problems I see in Section 2, I do not feel that I can adequately
> > understand the remainder of the document.  
> > 
> > Because of these problems I do not feel that this document should be advanced
> > to the next stage in the W3C recommendation process without going through
> > another last-call stage.  (This could however be performed by terminating the
> > current last call, quickly fixing the document, and starting another last
> > call.)
> 
> After perhaps overly brief consideration of your comments, we are
> somewhat sympathetic to your concerns about organization and
> clarity; however, we also have schedule considerations
> and the investment in other reviewers. Re-organizing the document
> at this stage would delay things considerably; it's not even clear
> that we could get a sufficient number of reviewers to take another
> look before CR.
> 
> The specific examples you give below are very valuable; I
> am marking this thread [needstest], which allows us to find
> it more easily during CR and integrate the examples you give
> into our test suite. We have also discussed the possibility
> of significant organizational changes after CR, such as
> moving the formal definitions to the back of the document.
> 
> As far as I can tell, all of the examples you give are useful
> clarification questions, but they do not demonstrate design errors.
> If they do, in fact, demonstrate design errors, I'm reasonably
> confident we will discover that as we integrate them into
> our test suite during CR.
> 
> Are you, by chance, satisfied by this response, which does
> not involve making the changes you request at this time,
> but includes an offer to give them due consideration after
> we request CR? If not, there's no need to reply; I'm marking
> this comment down as outstanding dissent unless I hear otherwise.
> 
> 
> > Specific comments follow:
> > 
> > Section 1.
> > 
> > 	An RDF graph is a set of triples; each triple consists of a
> > 	<em>subject</em>, a <em>predicate</em> and an <em>object</em>. This is
> > 	defined in RDF Concepts and Abstract Syntax.
> > 
> > C1.1: An unqualified "this" cannot be used at the beginning of the second sentence.
> > 
> > 	The RDF graph may be virtual, in that it is not fully materialized,
> > 
> > C1.2: Defining virtual in terms of another term that is not itself defined is not
> > very useful.
> > 
> > 	only doing the work needed for each query to execute.
> > 
> > C1.3: Who is doing what work here?
> > 
> > 	SPARQL is a query language for getting information from such RDF
> > 	graphs. 
> > 
> > C1.4: Surely a more formal tone is called for here.
> > 
> > 	It provides facilities to:
> > 	- extract information in the form of URIs, blank nodes, plain and typed
> > 	literals.
> > 	- extract RDF subgraphs.
> > 	- construct new RDF graphs based on information in the queried graphs.
> > 
> > C1.5: I don't recognize the intent of SPARQL in any of these options.
> > 
> > 	As a data access language, it is suitable for both local and remote
> > 	use. 
> > 
> > C1.6: The "it" is rather too far from its referent.
> > 
> > 	The companion SPARQL Protocol for RDF document describes the remote
> > 	access protocol.
> > 
> > C1.7: What about the "local" access protocol?  Is there one?  If so, where is it?  If
> > not, why is there not one?
> > 
> > 	<!-- Commented Document Outline -->
> > 
> > C1.8: There appears to be significant commented-out portions of the document.  Do
> > such parts of the document have any import?  If so, then they probably should
> > not be commented-out.  If not, then the commented-out portions should be
> > removed.
> > 
> > 
> > Section 2.
> > 
> > C2.15: In general, Section 2 switches modes much too much.  Which parts of
> > Section 2 are tutorial?  Which are definitional?  Which are explanatory?
> > 
> > 	The SPARQL query language is based on matching graph patterns.
> > 
> > C2.1: What is a "matching graph pattern"?  I do not believe that it is defined
> > in the remainder of the document.  (Yes, yes, I know that the problem is
> > actually that the sentence itself is poorly constructed.)
> > 
> > 	The simplest graph pattern is the triple pattern, which is like an RDF
> > 	triple, but with the possibility of a variable instead of an RDF term
> > 	in the subject, predicate or object positions.
> > 
> > C2.4: This should probably be stated more precisely, using, at least "and/or".
> > 
> > 	Combining triple gives a basic graph pattern, where an exact match to a
> > 	graph is needed to fulfill a pattern.
> > 
> > C2.2: Probably "triple" should be "triples".
> > 
> > C2.3: I do not believe that this matches the intent of SPARQL queries.
> > 
> > 	The example below shows a SPARQL query to find the title of a book from
> > 	the information in the given RDF graph.
> > 
> > C2.5: The use of "the given" here is not helpful.  I feel that it would be better
> > to use an indefinite article instead.
> > 
> > 
> > 	The terms delimited by "<>" are IRI references [...].  They stand for
> > 	IRIs, either directly, or relative to a base IRI.
> > 
> > C2.6: What is a term?  Which terms?  What does "stand for" mean here?  What
> > role does the base IRI play in this "stand for" relationship?
> > 
> > C2.7: The rules for IRIs are not adequately specified in Section 2.1.1.  Are
> > the two abbreviated mechanisms enclosed in "<>"?  Can a prefix expand to a
> > relative IRI?
> > 
> > 	optional datatype IRI or prefixed name (introduced by ^^)
> > 
> > C2.8: Can this be a relative IRI?  Is it expanded using the rules of
> > Section 2.1.1?
> > 
> > 	Variables in SPARQL queries have global scope; it is the same variable
> > 	everywhere in the query that the same name is used
> > 
> > C2.9:  Wrong number agreement.
> > 
> > 	Blank nodes are indicated by either the form _:a or use of [ ].
> > 
> > C2.10: Is _:a the *only* blank node allowed?  If not, which parts of these bits
> > of syntax can vary, and how?
> > 
> > 	Triple Patterns are written as a list of subject, predicate, object; 
> > 
> > C2.11: The examples of triple patterns don't seem to be written this way.
> > 
> > 	The following examples express the same query: 
> > 	[several examples]
> > 	Prefixes are syntactic: the prefix name does not affect the query, nor
> > 	do prefix names in queries need to be the same prefixes as used in a
> > 	serialization of the data. The following query is equivalent to the
> > 	previous examples and will give the same results when applied to the
> > 	same data:
> > 	[one example]
> > 
> > C2.12: The first group of examples appears to exhibit more internal variability
> > than the single example adds.  Why, then, is the single example broken out?  Is
> > there something that I am missing here?
> > 
> > 
> > 	The data format used in this document is
> > 
> > C2.13: What is the "data"?
> > 
> > C2.16: Section 2.1 claims to be about "Writing a Simple Query", but doesn't
> > seem to provide any guidance on this topic.
> > 
> > 	2.2 Initial Definitions
> > 
> > C2.14: There appears to have been quite a number of definitions already?  How,
> > then, can this be an "initial" set of definitions?
> > 
> > 	A query variable is a member of the set V where V is infinite and
> > 	disjoint.
> > 
> > C2.20:  What is V?  Perhaps you mean V to be some arbitrary, but fixed set.
> > 
> > 	Definition: Graph Pattern
> > 	A Graph Pattern is one of:
> > 	Basic Graph Pattern
> > 	Group Graph Pattern
> > 	Value Constraints
> > 	Optional Graph Pattern
> > 	Union Graph Pattern
> > 	RDF Dataset Graph Pattern
> > 
> > C2.15: Are these all part of simple queries?  If not, what is this doing in
> > Section 2?  Ditto for the definition for SPARQL Query.
> > 
> > 	Definition: SPARQL Query
> > 	A SPARQL query is a tuple (GP, DS, SM, R) where:
> > 
> > C2.16: What, then, are the things in Section 2.1 that contain the SELECT
> > keyword?
> > 
> > 	The following triple pattern has a subject variable (the variable
> > 	book), a predicate dc:title and an object variable (the variable
> > 	<title).
> > 
> > 	 ?book dc:title ?title .
> > 
> > C2.17: dc:title does not appear to be valid as any second element of a triple
> > pattern.
> > 
> > 	Definition: Triple Pattern
> > 	A triple pattern is member of the set:
> > 	(RDF-T union V) x (I union V) x (RDF-T union V)
> > 
> > C2.18:  How is the syntax above (?book dc:title ?title .) mapped into this set?
> > 
> > 	This definition of Triple Pattern includes literal subjects.
> > 	[...]
> > 	This definition also allows blank nodes in the predicate position.
> > 
> > C2.19:  The referent is too far away for this construction.
> > 
> > 	Definition: Pattern Solution
> > 	A variable solution is a substitution function from a subset of V, the
> > 	set of variables, to the set of RDF terms, RDF-T.  
> > 	A pattern solution, S, is a variable substitution whose domain includes
> > 	all the variables in V and whose range is a subset of the set of RDF
> > 	terms.  
> > 	The result of replacing every member v of V in a graph pattern P by
> > 	S(v) is written S(P).  
> > 	If v is not in the domain of S then S(v) is defined to be v.
> > 
> > C2.21: I thought that V was the set of variables.  Why then write "all the
> > variables in V"?
> > 
> > C2.22: Given that the domain of S is all the variables in V, i.e., all the
> > variables, then what use is the last sentence of the above definition?
> > 
> > 	has a single triple pattern as the query pattern
> > 
> > C2.23:  What is the "query pattern" of a query?  Perhaps you mean the graph
> > pattern of the query?
> > 
> > 	An E-entailment regime is a binary relation between subsets of RDF
> > 	graphs.
> > 
> > C2.24: Perhaps you mean "between sets of RDF graphs"?
> > 
> > 	Definition: Scoping Graph
> > 	The Scoping Graph G' for RDF graph G, is an RDF Graph that is
> > 	graph-equivalent to G
> > 
> > C2.25: FATAL: There can be many RDF graphs that are graph-equivalent to a
> > particular RDF graph.  Therefore the Scoping Graph is not adequately defined.
> > 
> > 	The scoping graph makes the graph to be matched independent of the
> > 	chosen blank node names.
> > 
> > C2.25a: Which chosen blank node names?  Why should this matter at all?  Aren't
> > the blank node names simply a notational convenience?
> > 
> > C2.25b: This needs to be proven.
> > 
> > 	Definition: Basic Graph Pattern E-matching
> > 	Given an entailment regime E, a basic graph pattern BGP, and RDF graph
> > 	G, with scoping graph G', then BGP E-matches with pattern solution S on
> > 	graph G with respect to scoping set B if:
> >         - BGP' is a basic graph pattern that is graph-equivalent to BGP
> >         - G' and BGP' do not share any blank node labels.
> >         - (G' union S(BGP')) is a well-formed RDF graph for E-entailment
> >         - G E-entails (G' union S(BGP'))
> >         - The RDF terms introduced by S all occur in B.
> > 
> > C2.26: Some of the elements of the point list are missing punctuation.
> > 
> > C2.27: FATAL: The status of B is not adequately provided.  Is B a parameter of
> > E-matching or is it somehow determined by the other parameters?  
> > 
> > 	These definitions allow for future extensions to SPARQL.
> > 
> > C2.28:  Which definitions?
> > 
> > 	This document defines SPARQL for simple entailment and the scoping set
> > 	B is the set of all RDF terms in G'.
> > 
> > C2.29:  SPARQL for simple entailment?  Probably you mean something like "This
> > document only defines the simple entailment version of SPARQL".
> > 
> > C2.30:  The second half of this sentence does not make any sense.  Perhaps you
> > mean something like "The simple entailment version of SPARQL (hereafter
> > SPARQL) is based on BGP E-matching where the entailment regime (E) is always
> > simple entailment and the scoping set (B) is always the set of RDF terms in
> > G'.  
> > 
> > C2.31: FATAL: This still leaves SPARQL matching with the following parameters:
> >   1/ the graph pattern BGP
> >   2/ the RDF graph G
> >   3/ the scoping graph G' (which is not adequately defined)
> >   The problem with G' needs to be addressed.
> > 
> > 	A pattern solution can then be defined as follows: to match a basic
> > 	graph pattern under simple entailment, it is possible to proceed by
> > 	finding a mapping from blank nodes and variables in the basic graph
> > 	pattern to terms in the graph being matched; a pattern solution is then
> > 	a mapping restricted to just the variables, possibly with blank nodes
> > 	renamed. Moreover, a uniqueness property guarantees the
> > 	interoperability between SPARQL systems: given a graph and a basic
> > 	graph pattern, the set of all the pattern solutions is unique up to
> > 	blank node renaming.
> > 
> > C2.32: Where is G' in this operation?
> > 
> > C2.33: It seems to me that SPARQL simple matching is entirely deterministic.
> > Given BGP, G, and G', the set of pattern solutions that make BGP match G with
> > scope G' is fixed.  I then don't understand the "unique up to blank node
> > renaming" above.
> > 
> > C2.34: If I am missing something here, and there indeed is something to be
> > shown, then it has to be proven.
> > 
> > 	There is a blank node [..] in this dataset, identified by_:a. 
> > 
> > C2.34:  What is "dataset"?
> > 
> > C2.35:  Are there not two blank nodes in this dataset?
> > 
> > 	In the SPARQL syntax, Basic Graph Patterns are sequences of triple
> > 	patterns mixed with value constraints.
> > 
> > C2.36:  Why not say something like "value constraints can be mixed in sequences
> > of triples patterns.  The triple patterns form a BGP."?
> > 
> > 	The results of a query is
> > 
> > C2.37: Why not "The result"?
> > 
> > C2.39: I believe that it would be very useful to show the four matches
> > generated by the basic query pattern in Section 2.6 (as well as the two matches
> > for the BGP in Section 2.5.3).
> > 
> > 	Blank nodes in the results of a query are identical to those occurring
> > 	in the dataset graphs
> > 
> > C2.38: This is very misleading.  SPARQL matching does indeed restrict the bnode
> > in query results to be bnodes from the RDF graph, but not in a useful way.  For
> > example,
> >   ?x ex:a ex:b .
> > matches against
> >   _:a ex:a _:b .
> > with two results for ?x, at least as far as I can determine.
> > 
> > C2.39: I believe that there are four matches for the BGP in Section 2.7.  Why
> > are only two results given?
> -- 
> Dan Connolly, W3C http://www.w3.org/People/Connolly/
> D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Tuesday, 6 June 2006 07:00:34 UTC