- From: Pat Hayes <phayes@ihmc.us>
- Date: Tue, 13 Dec 2005 16:27:53 -0600
- To: Eric Prud'hommeaux <eric@w3.org>, "Seaborne, Andy" <andy.seaborne@hp.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Reading through http://www.w3.org/2001/sw/DataAccess/rq23/ section 2.1.2 numer/number section 2.1.5 Should the last example use different prefixes (?title $title) instead of the same ones, to illustrate the text? 2.1.6 allows URIs to be /allows IRIs to be (??) 2.2 "Queries can include blank nodes; the blank nodes in a query are disjoint from all blank nodes in the RDF graphs being matched and members of the set of variables." Suggest we simply delete this sentence, it is redundant and it would be too complicated to fix it to make it strictly correct. 2.3 "Matching a triple pattern to a graph gives bindings between variables and RDF Terms so that the triple pattern, with the variables replaced by the corresponding RDF terms, is a triple of the graph being matched." Suggest we remove this sentence also, and postpone discussion of matching to later sections. 2.4 Definition: Pattern Solution There is a real problem in defining all this for triples first and then for graphs, as the triple-only definitions don't generalize properly. For example, if S1 is a solution for {T1} and S2 for {T2}, they might both be wrong for {T1, T2} if the triples share a bnode. Its not enough for each triple to match, contra to what it says in section 2.5. I don't know how to state the conditions for the single-triple case in terms of entailment. I suggest we first define simple graph pattern and then go into matching, so that all the matching definitions apply to at least simple graphs, with a triple considered to be a singleton graph pattern. If we did this, the definitions would be like this: Basic Graph Pattern. ... is a set of triple patterns. Pattern Solution. A variable substitution is a substitution function on a subset of V to RDF-T. A pattern solution on the pattern V to the dataset G is a variable substitution whose domain includes all the variables in V, whose range is a subset of the set of RDF terms occurring in G, and which matches the dataset DS. (Note, we don't now count bnodes in the range of a substitution, since that aspect of 'matching' will be handled by the simple entailment. Also, notice that the restriction on the range of a solution is essential to prevent bnode-leakage in answers.) (Doesn't the notion of matching need to refer to SM and R in some cases, as well as the dataset??) 2.5 [simple version A, assuming no told bnodes] Basic Graph Matching. A basic graph pattern P matches on graph G with solution S when S is a pattern solution on P to G such that G simply entails S(P). (Optional Lemma. S is a pattern solution on b.g.p. P to G just when S can be extended to a mapping S' from (V union {bnodes in P}) to RDF-T's s.t. S'(P) is a subgraph of G. [Proof, see interpolation lemma. QED.]. This connects it with the current defs and gives a more directly implementable account of matching.) (Optional lemma 2. There is a pattern solution on b.g.p. P to G just when G simply entails P', where P' is the RDF graph obtained from P by replacing each variable by a new blank node not in P or G. [Proof, obvious consequence of first lemma. QED.] Which shows that variables are just bnodes with a special kind of label, in a sense, and that in this case bnodes in pattern and bnodes in graph might as well be in different graphs.) "For a basic graph pattern to match some dataset, there must be a solution where each of the triple patterns matches the dataset with that solution." Suggest delete the above sentence, it is potentially misleading. Or else add "; but the reverse is not always true." "This query contains a basic graph pattern of two triple patterns, each of which must match for the graph pattern to match." // "This query contains a basic graph pattern of two triple patterns, each of which must match, with the same substitution, for the graph pattern to match." OR: 2.5 [complex version B, allowing for persistent bnodes between queries. Let me assume that the notion of "persistent bnodes in a graph pattern" makes sense somehow, exactly how will require some further work.] A basic graph pattern P matches on graph G with solution S when S is a pattern solution on P to G such that G simply entails (G union S(M(P))), where M is a 1:1 mapping on blank nodes which replaces each non-persistent blank node in P with a new blank node which does not occur in P or G. (It might be a good idea to define that notion of the M mapping ahead of time in an auxiliary definition of its own. We could call it a persistent merge or a partial merge, or some such name.) 2.6 last para mentions "simple, conjunctive" but conjunctive hasn't been defined yet. 2.7 needs to be rewritten if we go with the persistent-bnode option. Even if we don't, I think we should remove "A blank node may match any RDF term." since that isn't officially called 'matching' now we are talking about entailment. For complex version B, it might read something like this: 2.7.1 A blank node can occur in a query pattern in two ways. A persistent blank node acts like a 'temporary IRI': it is assumed to share the same scope as the dataset graph and all previous queries. Persistent blank nodes are intended to be used to allow a query to re-use a blank node provided as an answer binding from an earlier query, to obtain more information about the entity referred to by the blank node. A similar functionality may be obtained by the SPARQL server supplying an IRI in the answer binding for the query in such a case, even though the dataset contains only a blank node; this corresponds to 'skolemizing' the existential assertion in the dataset. All other blank nodes are referred to as non-persistent blank nodes. Non-persistent blank nodes in a query are scoped to the query graph pattern, and cannot be identified with bnodes in the dataset or in previous queries. SPARQL servers may refuse to accept queries containing persistent blank nodes, and may treat them as ill-formed queries. Servers which do accept persistent blank nodes must recognize the co-occurrence of persistent blank node identifiers in a query with the same identifiers provided as answer bindings in previous queries, and with the corresponding bnodes in the dataset. Servers which do not accept queries containing persistent blank nodes may substitute new blank node IDs for bnodeIDs in answer bindings, so that blank nodes in answer bindings in such cases should be treated as scoped locally to the query result. 2.8.3 Will need to be extended and slightly rewritten to handle persistent bnodes: what it currently says is all about non-persistent bnodes. If we don't go that way, it seems OK as it is. ------- OK, I think that is enough to incorporate entailment into the current design. With or without persistent bnodes, the definitions work smoothly with other entailments used instead of simple entailment, though of course the lemmas don't apply in those cases. Pat PS. One other rewording in section 2.7: "An application or client receiving the results of a query can tell that two solutions or two variable bindings differ in blank nodes but this information is only scoped to the results as defined in "SPARQL Variable Binding Results XML Format" or the CONSTRUCT result form." // SIMPLE CASE A: "Blank nodes in the results of a query are identical to those occurring in the dataset graphs, but this information cannot be used by an application or client which receives these results, since all blank nodes in subsequent queries are treated as being local to that query. In effect, this means that information about co-occurrences of blank nodes may be treated as scoped to the results as defined in "SPARQL Variable Binding Results XML Format" or the CONSTRUCT result form." OR, complex case B with persistent bnodes: //"Blank nodes in the results of a query are identical to those occurring in the dataset graphs, and may be re-used in subsequent queries as persistent blank nodes. Servers which accept a query containing a persistent blank node must respect the identity of such blank node identifiers across multiple queries and answer bindings. In contrast, blank nodes which appear in queries as nonpersistent blank nodes are treated as being local to that query, and bear no relationship to occurrences of the same blank node in other queries or their answer bindings." -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 13 December 2005 22:28:08 UTC