- From: Dan Connolly <connolly@w3.org>
- Date: Wed, 18 May 2005 13:56:56 -0500
- To: "Seaborne, Andy" <andy.seaborne@hp.com>
- Cc: www-archive@w3.org, Pat Hayes <phayes@ihmc.us>, Eric Prud'hommeaux <eric@w3.org>
On Wed, 2005-05-18 at 14:53 +0100, Seaborne, Andy wrote: > Formal Definitions from SPARQL Query Language for RDF > Definition: Query Variable > > A named query variable is a member of the set V where V is infinite > and disjoint from RDF-T. V is otherwise arbitrary. > > An unnamed query variable is a blank node in the query pattern and is > disjoint from any blank node in the RDF graphs being matched. That 2nd sentence seems like it belongs elsewhere. Ah... I think it's best to leave it out of the abstract definitions, and note in/near the grammar (or just outside the box) that a parser needs to come up with a variable name that's distinct from all the others when it parses the [] construct. > Definition: Graph Pattern > > A Graph Pattern is constructed from triple patterns and value > constraints, using the following operations to combine patterns: > > * Basic Graph Pattern, set set of triple patterns > * Group Pattern, where each of a set of graph patterns must all > match > * Optional Graph Pattern, where a graph patterns may extend the > solution > * Union Graph Pattern, where one of a set of graph patterns must > match > * Patterns on Named Graphs, where patterns are matched against > named graphs I think it suffices to say: a Graph Pattern is either a Basic Graph Pattern, a Group Pattern, and Optional Graph Pattern, ... or a Pattern on Named Graphs. add to this list: a Constraint Pattern; more below. > Definition: SPARQL Query > > A SPARQL query is a tuple consisting of a graph pattern, an RDF > dataset description, a number of solution sequence modifiers and a > result form. > Dataset description? Not just a dataset? hmm. I wonder where that comes in... It doesn't seem to show up anywhere else. As I think I suggested earlier, the semantics of FROM and FROM NAMED should be specified by reference to the protocol, not as part of the query language formalism. That reminds me... the protocol spec needs to say how to get from the default-graph-uri and the named-graph-uri's to the corresponding graphs. I'll try to remember to send mail about that separately. > Definition: Query Pattern > > A query has one main graph pattern called the Query Pattern. > A SPARQL Query has... Or: If (GP, D, M, R) is a SPARQL Query, then GP is its Query Pattern. This whole definition might be left out, as it's reasonably clearly implicit in the defintion of SPARQL Query. > Definition: Triple Pattern > > A triple pattern is member of the set: > (RDF-T union V) x (RDF-U union V) x (RDF-T union V) > > Definition: Solution Solution or Substitution? > > A substitution is a function from a subset of the set of variables to > the set of RDF terms, RDF-T. > > The result of replacing every v in a graph pattern P by S(v) is > written S(P). > > We say that S is a solution of P if P matches G with substitution S. > > Definition: Basic Graph Pattern > > A Basic Graph Pattern is a set of Triple Patterns. > > A basic graph pattern matches on graph G with substitution S if S(GP) > is entailed by G. > > Definition: Query Solution > > A Query Solution is a Pattern Solution for the Query Pattern. A > substitution in a query solution only contains variables mentioned in > the query. > suggest: Given a Query Pattern, a Query Solution is one of its Pattern Solutions whose substitutions only contains variables mentioned in the Query Pattern. or: Given a Query Q = (QP, D, M, R), S is a Query Solution for Q iff S is a Pattern Solution for QP and every variable in the domain of S is mentioned in QP. > Definition: Query Results > > The Query Results, for a given graph pattern GP on graph G, is written > R(GP,G), and is the set of all query solutions S such that S is a > solution for GP on G. hmm... rather: of all pattern solutions S, no? I'm looking for the definition of Pattern Solution and I don't see it. Hmm. > R(GP, G) may be the empty set. > > Definition: Value Constraint > > A graph pattern may involve a value constraint, which is a > boolean-valued expression of variables and RDF Terms that restricts > query solutions. Umm... "may involve"... the definition of Graph Pattern given didn't involve value constraints. Suggest rather adding Constraint Pattern above and: Definition: Value Constraint A Value Constraint is a boolean-valued expression of variables and RDF Terms. Definition: Constraint Pattern A Constraint Pattern is a pair of a Value Constraint and a Graph Pattern. S is a solution for a Constraint Pattern (C, GP) iff C(S) is true and S is a solution for GP. > Definition: Group Graph Pattern > > A Group Graph Pattern GP is a set of graph patterns, GPi. > > A solution of Group Graph Pattern GP on graph G is any solution S such > that, for every element GPi of GP, S is a solution of GPi. > > Definition: Optional Graph Pattern > > An Optional Graph Pattern is a graph pattern that can create new > solutions, but will always match. > > Given graph pattern GP, the Optional Graph Pattern Opt(GP) matches > with solution S if S is a solution for a match of GP, or S is not a > solution for GP. That doesn't work with my suggested clarification of Graph Pattern. I think you had it better earlier, when an Optional Graph Pattern had two parts: An Optional Graph Pattern is a pair of graph patterns (R, O). S is a solution for the optional graph pattern (R, O) iff - S is a solution for R and S is not a solution for O, or - S is a solution for the merge of R and O > Definition: Union Graph Pattern > > A union graph pattern is a set of graph patterns GP i . > > A union graph pattern matches a graph G with substitution S if there > is some GP i such that GP i matches G with substitution S. OK. > Definition: RDF Dataset > > An RDF dataset is a set = { G, (u1, G1), (u2, G2), . . . (un, Gn) } > where G and each Gi are graphs, and each ui is a URI. Each ui is > distinct. > > G is called the default graph. Gi are named graphs. OK > Definition: DataSet Graph Pattern > > If D is a dataset {G, (<u1> G1), ...}, and P is a graph pattern then S > is a pattern solution of GRAPH(g, P) if: > > g is a URI where g = <ui> for some i, and S is pattern solution of P > on Gi or g is a variable, S maps the variable g to <ui> and S is a > pattern solution of P on Gi. OK Hmm... all the other definitions of graph pattern solution are given w.r.t. just a graph, not a whole dataset. Ooops. We should specify how they work with a whole DataSet. > Definition: Solution Sequence > > A solution sequence is a list of solutions. > > The solution sequence from matching the query pattern is unordered. I don't know how to make sense of that 2nd sentence. I think it should be struck; earlier, the matches of a query pattern is defined to be a set, not a sequence. I think it'll be handy to have a notation for the set of solutions in the sequence. Perhaps: For a solution sequence SQ = <S1, S2, S3, ...> we write SQ' for the set {S1, S2, S3, ...} > Definition: Result Forms > > Based on the solution sequnc of the query pattern, there are a number > of Result Forms that a SPARQL query can produce: > * SELECT, where a table is produced > * CONSTRUCT, where an RDF graph is constructed > * DESCRIBE, where information about the resources located by the > query pattern is returned > * ASK, where a boolean is returned indicating whether the > pattern matched or not. Suggest: A Result Form is one of the tokens SELECT, CONSTRUCT, DESCRIBE, or ASK. Oops... later I discover that the CONSTRUCT form needs to carry a pattern... and I guess the SELECT form needs to carry its list of variables. > Definition: Projection > > For a substitution S and a finite set of variables VS, > project(S, VS) = { (v, S[v]) | v in VS } > > For a query solution Q project(Q, VS) is { project(S, VS) | S in Q } > > For a set QS of query solutions, project(QS, VS) is { project(Q, V) | > Q in QS } > > Definition: Distinct Solution Sequence > > A Distinct Solution Sequence has no two solutions the same. > > Two solutions are the same if they associate the same named variables > with teh same RDF-Terms. That 2nd sentence shouldn't be part of the definition (since the definition of Solution already gives identity condiditions); it could be an informative note. Ah... actually, I think you mean: SQ is an distinct solution sequence for Q = (GP, D, M, R) iff every S in SQ' is a solution for GP and either M has no DISTINCT modifier or every SQ[i] is distinct. > Definition: Ordered Solution Sequence > > A ordered solution sequence is a solution sequence where the sequence > is partially ordered with respect to some ordering condition. Er... every sequence is partially ordered with respect to some ordering condition. This term and definition seem redundant. Perhaps you/we mean: SQ is an ordered solution sequence for Q = (GP, D, M, R) iff SQ is a distinct solution sequence for Q and SQ is in order with respect to the ordering expressions in M. plus, outside the box: Note that if there are no ordering expressions in M, every solution sequence is in order. > Definition: Limited Solution Sequence > > A Limited Solution Sequence has at most a given, fixed number of > members. Again, except for infinite solution sequences, every solution sequence has at most some given number of members. Perhaps you/we mean: SQ is an ordered solution sequence for Q = (GP, D, M, R) iff SQ is an ordered solution sequence for Q and either M has no LIMIT or SQ is shorter than the LIMIT given in M. > Definition: Offset Solution Sequence > > An Offset Solution Sequence with respect to another solution sequence > SEQ, is one which starts at a given index of SEQ. > > The index of a sequence is index 0 (zero). Suggest rather: If SQ1 is an ordered, limited solution sequence for Q1 = (GP, D, M1, R), then SQ2 is an offset solution for Q2 = (GP, D, M2, R) iff M2 is M1 plus OFFSET N and SQ2 is all but the first N items in SQ1, i.e. <SQ1[N], SQ1[N+1], SQ1[N+2], ...>. If M has no OFFSET modifier, than every limited solution sequence SQ of Q=(GP, D, M, R) is also an offset solution sequence of Q. The set of solution sequence modifiers seems to be left implicit. I suppose it's clear enough as is, but not as rigorous as it could be; it could be defined as: a tuple of (S, L, O, D) where L, the limit, is an integer or infinity, O, the offset, is an integer, D is a boolean, and S is a set of ordering expressions. Ordering expressions could be further specified. Bleah. This is all obvious and just about as well left implicit as is. So.. is that the definition that the protocol spec should reference? the hunk of XML you get back has to represent an offset solution sequence of the (abstract form, after parsing of the) given query. Hm... that only covers the SELECT case. Er... actually, we haven't incorporated the projection step, have we? The ASK case is straightforward to specify (ala if there are any non-empty offset solution sequences, it's true, else false). Somewhere we specify how to take a set of solutions and a construct pattern and turn it into a graph, right? That should be a boxed definition... it should define construct(P, S) from pattern P and substitution S. The protocol for DESCRIBE is kinda vacuuous: every graph is a correct answer. It seems kinda awkward to include the result form in the abstract definition of SPARQL Query. Maybe it works out OK... I suggest the following definitions that the protocol spec and/or results format specs should reference: Definition: Answer Binding Sequence Given a query Q=(GP, D, M, SELECT VS), and an RDF dataset DS, SQ is an answer binding sequence of Q w.r.t. DS iff there is some offset solution sequence SQ2 where each SQ[i] is project(SQ2[i], VS). There are no answer binding sequences for queries with result forms other than SELECT. (hmm... is SQ2 unique? Should we be speaking of *the* answer binding sequence? Ah... no... not all queries give a total order.) Definition: Boolean Answer Given a query Q=(GP, D, M, ASK) and an RDF dataset DS, If there is any non-empty offset solution sequence SQ of Q, then the the boolean answer is true, else it is false. The boolean answer of queries with results forms other than ASK is unspecified. Definition: Graph Answer Given a query Q=(GP, D, M, CONSTRUCT P) and an RDF dataset DS, if SQ is an offset solution sequence of Q, then the merge of construct(P, SQ[i]) for each i is a graph answer for Q given DS. Given a query Q=(GP, D, M, DESCRIBE) and an RDF Dataset DS, every RDF graph is a graph answer for Q given DS. The graph answers(s) for queries with result forms other than CONSTRUCT and ASK is unspecified. Hmm... is merge right? We talked about whether bnodes share in this case. I forget which way it works. Maybe it should be union. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E see you at XTech in Amsterdam 24-27 May?
Received on Wednesday, 18 May 2005 19:54:35 UTC