putting entailment into SPARQL from Pat Hayes on 2005-12-13 (public-rdf-dawg@w3.org from October to December 2005)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 13 Dec 2005 16:27:53 -0600
To: Eric Prud'hommeaux <eric@w3.org>, "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p0623090dbfc4daa27d9d@[10.100.0.23]>
Reading through http://www.w3.org/2001/sw/DataAccess/rq23/

section 2.1.2 numer/number
section 2.1.5  Should the last example use different prefixes (?title 
$title) instead of the same ones, to illustrate the text?
2.1.6 allows URIs to be /allows IRIs to be (??)

2.2
"Queries can include blank nodes; the blank nodes in a query are 
disjoint from all blank nodes in the RDF graphs being matched and 
members of the set of variables."
Suggest we simply delete this sentence, it is redundant and it would 
be too complicated to fix it to make it strictly correct.

2.3
"Matching a triple pattern to a graph gives bindings between 
variables and RDF Terms so that the triple pattern, with the 
variables replaced by the corresponding RDF terms, is a triple of the 
graph being matched."
Suggest we remove this sentence also, and postpone discussion of 
matching to later sections.

2.4 Definition: Pattern Solution
There is a real problem in defining all this for triples first and 
then for graphs, as the triple-only definitions don't generalize 
properly. For example, if S1 is a solution for {T1} and S2 for {T2}, 
they might both be wrong for {T1, T2} if the triples share a bnode. 
Its not enough for each triple to match, contra to what it says in 
section 2.5.

I don't know how to state the conditions for the single-triple case 
in terms of entailment.

I suggest we first define simple graph pattern and then go into 
matching, so that all the matching definitions apply to at least 
simple graphs, with a triple considered to be a singleton graph 
pattern.

If we did this, the definitions would be like this:

Basic Graph Pattern.
... is a set of triple patterns.

Pattern Solution.
A variable substitution is a substitution function on a subset of V 
to RDF-T. A pattern solution on the pattern V to the dataset G is a 
variable substitution whose domain includes all the variables in V, 
whose range is a subset of the set of RDF terms occurring in G, and 
which matches the dataset DS.

(Note, we don't now count bnodes in the range of a substitution, 
since that aspect of 'matching' will be handled by the simple 
entailment. Also, notice that the restriction on the range of a 
solution is essential to prevent bnode-leakage in answers.)

(Doesn't the notion of matching need to refer to SM and R in some 
cases, as well as the dataset??)

2.5 [simple version A, assuming no told bnodes]
Basic Graph Matching.

A basic graph pattern P matches on graph G with solution S when S is 
a pattern solution on P to G such that G simply entails S(P).

(Optional Lemma. S is a pattern solution on b.g.p. P to G just when S 
can be extended to a mapping S' from (V union {bnodes in P}) to 
RDF-T's  s.t. S'(P) is a subgraph of G. [Proof, see interpolation 
lemma. QED.].
This connects it with the current defs and gives a more directly 
implementable account of matching.)

(Optional lemma 2. There is a pattern solution on b.g.p. P to G just 
when G simply entails P', where P' is the RDF graph obtained from P 
by replacing each variable by a new blank node not in P or G. [Proof, 
obvious consequence of first lemma. QED.]
Which shows that variables are just bnodes with a special kind of 
label, in a sense, and that in this case bnodes in pattern and bnodes 
in graph might as well be in different graphs.)

"For a basic graph pattern to match some dataset, there must be a 
solution where each of the triple patterns matches the dataset with 
that solution."

Suggest delete the above sentence, it is potentially misleading. Or 
else add "; but the reverse is not always true."

"This query contains a basic graph pattern of two triple patterns, 
each of which must match for the graph pattern to match."
//
"This query contains a basic graph pattern of two triple patterns, 
each of which must match, with the same substitution, for the graph 
pattern to match."

OR:
2.5 [complex version B, allowing for persistent bnodes between 
queries. Let me assume that the notion of "persistent bnodes in a 
graph pattern" makes sense somehow, exactly how will require some 
further work.]

A basic graph pattern P matches on graph G with solution S when S is 
a pattern solution on P to G such that G simply entails (G union 
S(M(P))), where M is a 1:1 mapping on blank nodes which replaces each 
non-persistent blank node in P with a new blank node which does not 
occur in P or G.

(It might be a good idea to define that notion of the M mapping ahead 
of time in an auxiliary definition of its own. We could call it a 
persistent merge or a partial merge, or some such name.)


2.6 last para mentions "simple, conjunctive" but conjunctive hasn't 
been defined yet.

2.7 needs to be rewritten if we go with the persistent-bnode option. 
Even if we don't, I think we should remove "A blank node may match 
any RDF term." since that isn't officially called 'matching' now we 
are talking about entailment.

For complex version B, it might read something like this:

2.7.1
A blank node can occur in a query pattern in two ways. A persistent 
blank node acts like a 'temporary IRI': it is assumed to share the 
same scope as the dataset graph and all previous queries. Persistent 
blank nodes are intended to be used to allow a query to re-use a 
blank node provided as an answer binding from an earlier query, to 
obtain more information about the entity referred to by the blank 
node. A similar functionality may be obtained by the SPARQL server 
supplying an IRI in the answer binding for the query in such a case, 
even though the dataset contains only a blank node; this corresponds 
to 'skolemizing' the existential assertion in the dataset. All other 
blank nodes are referred to as non-persistent blank nodes. 
Non-persistent blank nodes in a query are scoped to the query graph 
pattern, and cannot be identified with bnodes in the dataset or in 
previous queries.

SPARQL servers may refuse to accept queries containing persistent 
blank nodes, and may treat them as ill-formed queries. Servers which 
do accept persistent blank nodes must recognize the co-occurrence of 
persistent blank node identifiers in a query with the same 
identifiers provided as answer bindings in previous queries, and with 
the corresponding bnodes in the dataset. Servers which do not accept 
queries containing persistent blank nodes may substitute new blank 
node IDs for bnodeIDs in answer bindings, so that blank nodes in 
answer bindings in such cases should be treated as scoped locally to 
the query result.

2.8.3  Will need to be extended and slightly rewritten to handle 
persistent bnodes: what it currently says is all about non-persistent 
bnodes. If we don't go that way, it seems OK as it is.

-------

OK, I think that is enough to incorporate entailment into the current 
design. With or without persistent bnodes, the definitions work 
smoothly with other entailments used instead of simple entailment, 
though of course the lemmas don't apply in those cases.

Pat

PS. One other rewording in section 2.7:

"An application or client receiving the results of a query can tell 
that two solutions or two variable bindings differ in blank nodes but 
this information is only scoped to the results as defined in "SPARQL 
Variable Binding Results XML Format" or the CONSTRUCT result form."
//
SIMPLE CASE A:
"Blank nodes in the results of a query are identical to those 
occurring in the dataset graphs, but this information cannot be used 
by an application or client which receives these results, since all 
blank nodes in subsequent queries are treated as being local to that 
query. In effect, this means that information about co-occurrences of 
blank nodes may be treated as scoped to the results as defined in 
"SPARQL Variable Binding Results XML Format" or the CONSTRUCT result 
form."

OR, complex case B with persistent bnodes:

//"Blank nodes in the results of a query are identical to those 
occurring in the dataset graphs, and may be re-used in subsequent 
queries as persistent blank nodes. Servers which accept a query 
containing a persistent blank node must respect the identity of such 
blank node identifiers across multiple queries and answer bindings. 
In contrast, blank nodes which appear in queries as nonpersistent 
blank nodes are treated as being local to that query, and bear no 
relationship to occurrences of the same blank node in other queries 
or their answer bindings."


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 13 December 2005 22:28:08 UTC