- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 3 Mar 2006 07:14:24 -0500
- To: public-rdf-dawg@w3.org
- Message-ID: <20060303121424.GD17752@w3.org>
pending headers: To: Herman ter Horst <herman.ter.horst@philips.com> Cc: public-rdf-dawg-comments@w3.org On Fri, Oct 28, 2005 at 01:45:37PM +0200, Herman ter Horst wrote: > SPARQL Query Language for RDF: review comments > (apologies that these comments are late) > > = > > Query facilities play an important role for RDF. > It is good that a standard query language for RDF is being > developed. > The document describes useful query primitives for RDF. > The document represents a lot of work. > > = > > In my view, it is difficult to build up a clear and > precise picture of SPARQL queries from the document. > A number of formal definitions in the document do not > seem to be sufficiently clear and precise. > For the normative description of RDF queries, this is a > problem, in my view. Clear and precise definitions can > form a basis for interoperability. > The document seems to be stronger on examples than on > definitions. This holds, in particular, for the > definitions relating to graph patterns and matching. > I expand on this in the following comments. > > -7: "a graph pattern P, where P is not a dataset graph > pattern, matches a dataset with solution S if P matches > the default graph of the dataset" > It seems that this definition does not reflect correctly > the intention w.r.t. examples 8.3 and 8.4. > The graph pattern of ex.8.3 seems to form a group pattern > consisting of two dataset graph patterns; therefore, this is not > a dataset graph pattern, so the definition prescribes that the > default graph should be matched, which is clearly not what > is intended. > -7: states a matching process but does not state what > a dataset graph pattern is, although that is the title > of the second definition. > > -5.4: Optional graph patterns are defined formally as pairs > Opt(A,B) of graph patterns. How should the > graph pattern P1 OPT P2 OPT P3 in the example be interpreted > in terms of pairs: as (P1 OPT P2) OPT P3? > This is not made explicit in the document. The current text has: [[ The OPTIONAL keyword is left-associative : pattern OPTIONAL { pattern } OPTIONAL { pattern } matches the same as: { pattern OPTIONAL { pattern } } OPTIONAL { pattern } ]] > -5.4: The definition of matching of Opt(A,B) seems to miss > a kind of maximality requirement: if it is possible to > match B by extending a solution, then this should be done. > -5.4: the definition of matching Opt(A,B) does not mention > graphs. > > -2.7: The text on "blank nodes and query results" is informal, > using only an example. It seems that the point made affects > the matching process, although this is not included > in the definition of matching. If I interpret 2.7 > correctly, the following reformulation of the definition of > matching basic graph patterns is what is actually intended: > a basic graph pattern P and a solution S match a graph G > if S(P) is an RDF graph that is equivalent to a subset of G > (equivalent, in the sense defined in RDF Concepts). > > In a later comment below I expand further on the intention > expressed informally in 2.7. In particular I suggest to define > a notion of equivalence of solutions. > > - The name "group" in "group patterns" does not illustrate > the intended meaning. How does a group differ from a set? > (There is a standard mathematical meaning of the word group, > which does not apply to this context, which is confusing.) > I am in favor of replacing the term "group pattern" with > "conjunctive pattern", which clearly indicates the intended > meaning. > It would then be natural to replace "alternative patterns" > with "disjunctive patterns". > > -2.2. This section, entitled "initial definitions", gives > an 8-line definition of graph patterns. Since these lines > consist of forward references, this is not a definition > (in itself). (A similar list is provided in 4, but the latter > list is not equal to the first list.) > This definition and the following definition, of queries, > cannot be understood at this point because their parts cannot > be understood. > Section 5.3 states: "graph patterns are defined recursively". > For the complete definition of graph patterns, one needs > to go back and forth through the document: the parts of the > defition are interleaved with examples, information about > blank nodes, information about literals, etc. > Trying to understand the definition of graph patterns and > matching from the document is a time-consuming process which > does not lead completely to the desired understanding. > Quite generally, the forward references in formal definitions > leads to a problem for readibility, in my view. > > In my view, the clarity of the SPARQL spec would benefit immensely > by collecting the definitions of graph pattern and matching > in one place, and by doing the same thing with other > formal definitions (although graph patterns and matching seem > to form most of the complexity). > The document can then be read in normal, forward-reading mode. > One can start with the formal definition of queries and query > solutions when the definitions of graph pattern and matching > (and the other ingredients of queries) are clear. > To make a comparison: the semantics specs for RDF and OWL > also have basic, relatively complicated technical ingredients > (i.e. RDFS/OWL interpretations) which are completely defined > before they are used in the corresponding definitions of > entailment. > (For a primer/tutorial document, a different way of > presentation would be preferable than for a formal, normative > specification.) DanC: we will consider this during CR > To make this comment more specific, I include a suggestion > for a possible way to phrase the top-level definitions of > graph pattern and matching. > I am making up some abstract syntax, to complete the > parts of the abstract syntax already included in the > document: OPT(P,Q), Graph(g,P). > Several parts of the two definitions that follow are copied > from the document. > > Definition: graph pattern. > Graph patterns are defined recursively. > Basic graph patterns are graph patterns > If P1,...,Pn are graph patterns, then CON(P1,...,Pn) > is a graph pattern: conjunctive graph pattern > If P1,...,Pn are graph patterns, then DIS(P1,...,Pn) > is a graph pattern: disjunctive graph pattern > A value constraint is a graph pattern > If P and Q are graph patterns, then OPT(P,Q) is a > graph pattern: optional graph pattern > If g is a IRI or a variable and P a graph pattern, > then Graph(g,P) is a graph pattern: dataset graph pattern > > For matching, it seems that datasets need to be considered > from the start, in view of the above comment on Section 7. > (It would be good to mention explicitly that an RDF graph > is viewed as an RDF dataset with no named graphs.) > > Definition: matching of a graph pattern P and a solution S > on an RDF dataset DS = G,(ui,Gi). > basic graph pattern P: S(P) is an RDF graph that is equivalent > to a subset of G > CON(P1,...,Pn): for each j, Pj,S matches DS > DIS(P1,...,Pn): there is a j such that Pj,S matches DS > value constraint C: S(C) is true > Opt(P,Q): P,S matches DS and if possible, Q,S matches DS > (i.e. S is not "weaker/smaller than possible") > Graph(g,P): there is a j such that (g=uj or S(g)=uj) > and P,S matches Gj,(ui,Gi). > > = > > Definitions such as these would also facilitate the development > of proofs of statements about queries, by structural induction > on graph patterns. > > For example, the intention that the identity of blank nodes > is irrelevant, expressed informally in 2.7, could be made > more precise in the following way. > > Definition: two datasets G,(ui,Gi) and H,(ui,Hi) are equivalent > if G is equivalent to H and if for each i, Gi is equivalent to Hi. > > Definition: two solutions S, T are equivalent if there is a > function U that maps each IRI and literal to itself, > that maps blank nodes to blank nodes, that maps different > blank nodes to different blank nodes, such that S and T > have the same domains, and such that > for each item x in the domain of S or T, S(x)=U(T(x)). > > It follows by structural induction on graph patterns that > if the graph pattern P and the solution S match the dataset DS, > S is equivalent to S' and DS is equivalent to DS', > then P and S' match DS'. > > = > > The global structure of the document does not seem to be as > coherent and systematic as it could be. > This relates to the preceding comments about the way the > definitions are presented, and is reflected in the TOC. > For example, the chapter title "making simple queries" > suggests a tutorial kind of introduction, although the chapter > includes many definitions. The chapter title "working with RDF > literals" does not indicate the place of this chapter in a spec > of RDF queries. The chapter "graph patterns" only expands on > group graph patterns. Why are there three chapters > on datasets? The chapter "query result forms" does not > only deal with query result forms but also with > solution modifiers. > > It seems that a more logical, systematic way of organizing the > document, without the forward references that are now in the > formal definitions, could for example include four chapters > with the following titles, in the following order: > /RDF datasets > /Value constraints > /graph patterns and matching > /queries > > = > > 10./10.2: 10.1 speaks of "the solutions of a graph pattern", > in 10.2 the definition of SELECT speaks of "the set of > solutions found by matching DS with G" > It seems that this could be made more precise, in line with > the intention expressed informally in 2.7, by noting that > these sets of solutions do not need to include solutions > that are equivalent in the sense defined above. > > = > > 10.1: In the first definition, solution sequence, it is > confusing, in view of the standard meaning of > list, that a "list" is an "unordered collection". > Does the following fit with the intention? > A (instead of "The") solution sequence is any ordered collection > formed from the (non-equivalent) solutions of the query pattern. > > 10.1: definition of distinct solution sequence. > first, set(S) is the set of solutions in S (rather than the > set of solution sequences) > second, the phrase "distinct(S)=(Si|Si!=Sj for all i!=j)" does not > provide a definition of distinct(S); it seems to be enough > to give a definition in words, e.g.: distinct(S) is a > solution sequence that does not contain duplicate solutions > and such that set(distinct(S))=set(S). > > = > > 10.3.2: definition of graph template: > S(tj) should be a set containing just one RDF triple > (if all variables ...), rather than an RDF triple; > otherwise, the union definition that follows does not work. > > = > > > Herman ter Horst > Philips Research -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +81.90.6533.3882 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Friday, 3 March 2006 12:14:30 UTC