- From: Herman ter Horst <herman.ter.horst@philips.com>
- Date: Fri, 28 Oct 2005 13:45:37 +0200
- To: public-rdf-dawg-comments@w3.org
- Message-ID: <OFC99B9CA9.34EA98A2-ONC12570A8.003B7029-C12570A8.0040C484@philips.com>

SPARQL Query Language for RDF: review comments (apologies that these comments are late) = Query facilities play an important role for RDF. It is good that a standard query language for RDF is being developed. The document describes useful query primitives for RDF. The document represents a lot of work. = In my view, it is difficult to build up a clear and precise picture of SPARQL queries from the document. A number of formal definitions in the document do not seem to be sufficiently clear and precise. For the normative description of RDF queries, this is a problem, in my view. Clear and precise definitions can form a basis for interoperability. The document seems to be stronger on examples than on definitions. This holds, in particular, for the definitions relating to graph patterns and matching. I expand on this in the following comments. -7: "a graph pattern P, where P is not a dataset graph pattern, matches a dataset with solution S if P matches the default graph of the dataset" It seems that this definition does not reflect correctly the intention w.r.t. examples 8.3 and 8.4. The graph pattern of ex.8.3 seems to form a group pattern consisting of two dataset graph patterns; therefore, this is not a dataset graph pattern, so the definition prescribes that the default graph should be matched, which is clearly not what is intended. -7: states a matching process but does not state what a dataset graph pattern is, although that is the title of the second definition. -5.4: Optional graph patterns are defined formally as pairs Opt(A,B) of graph patterns. How should the graph pattern P1 OPT P2 OPT P3 in the example be interpreted in terms of pairs: as (P1 OPT P2) OPT P3? This is not made explicit in the document. -5.4: The definition of matching of Opt(A,B) seems to miss a kind of maximality requirement: if it is possible to match B by extending a solution, then this should be done. -5.4: the definition of matching Opt(A,B) does not mention graphs. -2.7: The text on "blank nodes and query results" is informal, using only an example. It seems that the point made affects the matching process, although this is not included in the definition of matching. If I interpret 2.7 correctly, the following reformulation of the definition of matching basic graph patterns is what is actually intended: a basic graph pattern P and a solution S match a graph G if S(P) is an RDF graph that is equivalent to a subset of G (equivalent, in the sense defined in RDF Concepts). In a later comment below I expand further on the intention expressed informally in 2.7. In particular I suggest to define a notion of equivalence of solutions. - The name "group" in "group patterns" does not illustrate the intended meaning. How does a group differ from a set? (There is a standard mathematical meaning of the word group, which does not apply to this context, which is confusing.) I am in favor of replacing the term "group pattern" with "conjunctive pattern", which clearly indicates the intended meaning. It would then be natural to replace "alternative patterns" with "disjunctive patterns". -2.2. This section, entitled "initial definitions", gives an 8-line definition of graph patterns. Since these lines consist of forward references, this is not a definition (in itself). (A similar list is provided in 4, but the latter list is not equal to the first list.) This definition and the following definition, of queries, cannot be understood at this point because their parts cannot be understood. Section 5.3 states: "graph patterns are defined recursively". For the complete definition of graph patterns, one needs to go back and forth through the document: the parts of the defition are interleaved with examples, information about blank nodes, information about literals, etc. Trying to understand the definition of graph patterns and matching from the document is a time-consuming process which does not lead completely to the desired understanding. Quite generally, the forward references in formal definitions leads to a problem for readibility, in my view. In my view, the clarity of the SPARQL spec would benefit immensely by collecting the definitions of graph pattern and matching in one place, and by doing the same thing with other formal definitions (although graph patterns and matching seem to form most of the complexity). The document can then be read in normal, forward-reading mode. One can start with the formal definition of queries and query solutions when the definitions of graph pattern and matching (and the other ingredients of queries) are clear. To make a comparison: the semantics specs for RDF and OWL also have basic, relatively complicated technical ingredients (i.e. RDFS/OWL interpretations) which are completely defined before they are used in the corresponding definitions of entailment. (For a primer/tutorial document, a different way of presentation would be preferable than for a formal, normative specification.) To make this comment more specific, I include a suggestion for a possible way to phrase the top-level definitions of graph pattern and matching. I am making up some abstract syntax, to complete the parts of the abstract syntax already included in the document: OPT(P,Q), Graph(g,P). Several parts of the two definitions that follow are copied from the document. Definition: graph pattern. Graph patterns are defined recursively. Basic graph patterns are graph patterns If P1,...,Pn are graph patterns, then CON(P1,...,Pn) is a graph pattern: conjunctive graph pattern If P1,...,Pn are graph patterns, then DIS(P1,...,Pn) is a graph pattern: disjunctive graph pattern A value constraint is a graph pattern If P and Q are graph patterns, then OPT(P,Q) is a graph pattern: optional graph pattern If g is a IRI or a variable and P a graph pattern, then Graph(g,P) is a graph pattern: dataset graph pattern For matching, it seems that datasets need to be considered from the start, in view of the above comment on Section 7. (It would be good to mention explicitly that an RDF graph is viewed as an RDF dataset with no named graphs.) Definition: matching of a graph pattern P and a solution S on an RDF dataset DS = G,(ui,Gi). basic graph pattern P: S(P) is an RDF graph that is equivalent to a subset of G CON(P1,...,Pn): for each j, Pj,S matches DS DIS(P1,...,Pn): there is a j such that Pj,S matches DS value constraint C: S(C) is true Opt(P,Q): P,S matches DS and if possible, Q,S matches DS (i.e. S is not "weaker/smaller than possible") Graph(g,P): there is a j such that (g=uj or S(g)=uj) and P,S matches Gj,(ui,Gi). = Definitions such as these would also facilitate the development of proofs of statements about queries, by structural induction on graph patterns. For example, the intention that the identity of blank nodes is irrelevant, expressed informally in 2.7, could be made more precise in the following way. Definition: two datasets G,(ui,Gi) and H,(ui,Hi) are equivalent if G is equivalent to H and if for each i, Gi is equivalent to Hi. Definition: two solutions S, T are equivalent if there is a function U that maps each IRI and literal to itself, that maps blank nodes to blank nodes, that maps different blank nodes to different blank nodes, such that S and T have the same domains, and such that for each item x in the domain of S or T, S(x)=U(T(x)). It follows by structural induction on graph patterns that if the graph pattern P and the solution S match the dataset DS, S is equivalent to S' and DS is equivalent to DS', then P and S' match DS'. = The global structure of the document does not seem to be as coherent and systematic as it could be. This relates to the preceding comments about the way the definitions are presented, and is reflected in the TOC. For example, the chapter title "making simple queries" suggests a tutorial kind of introduction, although the chapter includes many definitions. The chapter title "working with RDF literals" does not indicate the place of this chapter in a spec of RDF queries. The chapter "graph patterns" only expands on group graph patterns. Why are there three chapters on datasets? The chapter "query result forms" does not only deal with query result forms but also with solution modifiers. It seems that a more logical, systematic way of organizing the document, without the forward references that are now in the formal definitions, could for example include four chapters with the following titles, in the following order: /RDF datasets /Value constraints /graph patterns and matching /queries = 10./10.2: 10.1 speaks of "the solutions of a graph pattern", in 10.2 the definition of SELECT speaks of "the set of solutions found by matching DS with G" It seems that this could be made more precise, in line with the intention expressed informally in 2.7, by noting that these sets of solutions do not need to include solutions that are equivalent in the sense defined above. = 10.1: In the first definition, solution sequence, it is confusing, in view of the standard meaning of list, that a "list" is an "unordered collection". Does the following fit with the intention? A (instead of "The") solution sequence is any ordered collection formed from the (non-equivalent) solutions of the query pattern. 10.1: definition of distinct solution sequence. first, set(S) is the set of solutions in S (rather than the set of solution sequences) second, the phrase "distinct(S)=(Si|Si!=Sj for all i!=j)" does not provide a definition of distinct(S); it seems to be enough to give a definition in words, e.g.: distinct(S) is a solution sequence that does not contain duplicate solutions and such that set(distinct(S))=set(S). = 10.3.2: definition of graph template: S(tj) should be a set containing just one RDF triple (if all variables ...), rather than an RDF triple; otherwise, the union definition that follows does not work. = Herman ter Horst Philips Research

Received on Friday, 28 October 2005 11:48:01 UTC