Re: putting entailment into SPARQL from Seaborne, Andy on 2005-12-15 (public-rdf-dawg@w3.org from October to December 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 15 Dec 2005 13:35:37 +0000
To: Pat Hayes <phayes@ihmc.us>
CC: "Eric Prud'hommeaux" <eric@w3.org>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <43A17129.9050300@hp.com>
Pat, Bijan, Enrico,

For #owlDisjunction, we need to handle distinguished and nondistinguished 
variables differently.  My understanding of teh design below is that this 
happens when nondistinguished variables are bNodes - and, for SPARQL, any 
named variable is distinguished.

v1.581 (and later) for changes (ongoing)


Pat Hayes wrote:
> Reading through http://www.w3.org/2001/sw/DataAccess/rq23/
> 

<editorial comments done>

> 
> 2.4 Definition: Pattern Solution
> There is a real problem in defining all this for triples first and 
> then for graphs, as the triple-only definitions don't generalize 
> properly. For example, if S1 is a solution for {T1} and S2 for {T2}, 
> they might both be wrong for {T1, T2} if the triples share a bnode. 
> Its not enough for each triple to match, contra to what it says in 
> section 2.5.
> 
> I don't know how to state the conditions for the single-triple case 
> in terms of entailment.
> 
> I suggest we first define simple graph pattern and then go into 
> matching, so that all the matching definitions apply to at least 
> simple graphs, with a triple considered to be a singleton graph 
> pattern.

Agreed - we have to now define matching for basic patterns and not for triple 
patterns.

> 
> If we did this, the definitions would be like this:
> 
> Basic Graph Pattern.
> ... is a set of triple patterns.
> 
> Pattern Solution.
> A variable substitution is a substitution function on a subset of V 
> to RDF-T. A pattern solution on the pattern V to the dataset G is a 
> variable substitution whose domain includes all the variables in V, 
> whose range is a subset of the set of RDF terms occurring in G, and 
> which matches the dataset DS.

?? "on a pattern P"


> (Note, we don't now count bnodes in the range of a substitution, 
> since that aspect of 'matching' will be handled by the simple 
> entailment. Also, notice that the restriction on the range of a 
> solution is essential to prevent bnode-leakage in answers.)
> 
> (Doesn't the notion of matching need to refer to SM and R in some 
> cases, as well as the dataset??)

Just the dataset - this is pattern matching, and the solution modifers are 
applied afterwards.   Result forms are applied after solution modifiers.  Need 
to add something about which graph in the dataset is being matched (that was 
already on the ToDo list).

> 
> 2.5 [simple version A, assuming no told bnodes]

I'm trying for a "told bNodes" solution and keeping this non-told-bnode text 
in reserve.

> Basic Graph Matching.
> 
> A basic graph pattern P matches on graph G with solution S when S is 
> a pattern solution on P to G such that G simply entails S(P).
> 
> (Optional Lemma. S is a pattern solution on b.g.p. P to G just when S 
> can be extended to a mapping S' from (V union {bnodes in P}) to 
> RDF-T's  s.t. S'(P) is a subgraph of G. [Proof, see interpolation 
> lemma. QED.].
> This connects it with the current defs and gives a more directly 
> implementable account of matching.)
> 
> (Optional lemma 2. There is a pattern solution on b.g.p. P to G just 
> when G simply entails P', where P' is the RDF graph obtained from P 
> by replacing each variable by a new blank node not in P or G. [Proof, 
> obvious consequence of first lemma. QED.]
> Which shows that variables are just bnodes with a special kind of 
> label, in a sense, and that in this case bnodes in pattern and bnodes 
> in graph might as well be in different graphs.)
> 
> "For a basic graph pattern to match some dataset, there must be a 
> solution where each of the triple patterns matches the dataset with 
> that solution."

Removed

> 
> Suggest delete the above sentence, it is potentially misleading. Or 
> else add "; but the reverse is not always true."
> 
> "This query contains a basic graph pattern of two triple patterns, 
> each of which must match for the graph pattern to match."
> //
> "This query contains a basic graph pattern of two triple patterns, 
> each of which must match, with the same substitution, for the graph 
> pattern to match."

Done.

> 
> OR:
> 2.5 [complex version B, allowing for persistent bnodes between 
> queries. Let me assume that the notion of "persistent bnodes in a 
> graph pattern" makes sense somehow, exactly how will require some 
> further work.]

Suggestion: we do not discuss "persistence" of bnodes and allow the query to 
contain told bnodes with no assumptions or guarantees across queries.

SELECT *
FROM <foo>
WHERE { ... }

is not going to be guaranteed to work for bNodes across queries.  Reading 
<foo> each time is a reasonable implementation and all the bNodes may get 
relabelled.  Or reading <foo> just uses the document-scoped labels in the file 
and a sequence of queries will work.  Or it may be already read and cached.

> 
> A basic graph pattern P matches on graph G with solution S when S is 
> a pattern solution on P to G such that G simply entails (G union 
> S(M(P))), where M is a 1:1 mapping on blank nodes which replaces each 
> non-persistent blank node in P with a new blank node which does not 
> occur in P or G.
> 
> (It might be a good idea to define that notion of the M mapping ahead 
> of time in an auxiliary definition of its own. We could call it a 
> persistent merge or a partial merge, or some such name.)

Didn't understand the use of the term "merge" for M - I called it a renaming, 
mapping sets of triples to sets of triple patterns.

v1.581

> 
> 
> 2.6 last para mentions "simple, conjunctive" but conjunctive hasn't 
> been defined yet.

Changed to
"""This is a basic graph pattern match, ..."

> 
> 2.7 needs to be rewritten if we go with the persistent-bnode option. 
> Even if we don't, I think we should remove "A blank node may match 
> any RDF term." since that isn't officially called 'matching' now we 
> are talking about entailment.

We need to fix on the terminolgy.  Is "told bnode" universal, is "persistent 
bnode", or "literal" or "constant" Bnode (as in a prgamming language literal 
or constant.

I've picked "told bNode" as a placeholder (not consistently yet) - suggestions 
very welcome - need a pair for told/not-told & persistentent/non-persistent.

> 
> For complex version B, it might read something like this:
> 
> 2.7.1
> A blank node can occur in a query pattern in two ways. A persistent 
> blank node acts like a 'temporary IRI': it is assumed to share the 
> same scope as the dataset graph and all previous queries. Persistent 
> blank nodes are intended to be used to allow a query to re-use a 
> blank node provided as an answer binding from an earlier query, to 
> obtain more information about the entity referred to by the blank 
> node. A similar functionality may be obtained by the SPARQL server 
> supplying an IRI in the answer binding for the query in such a case, 
> even though the dataset contains only a blank node; this corresponds 
> to 'skolemizing' the existential assertion in the dataset. All other 
> blank nodes are referred to as non-persistent blank nodes. 
> Non-persistent blank nodes in a query are scoped to the query graph 
> pattern, and cannot be identified with bnodes in the dataset or in 
> previous queries.
> 
> SPARQL servers may refuse to accept queries containing persistent 
> blank nodes, and may treat them as ill-formed queries. Servers which 
> do accept persistent blank nodes must recognize the co-occurrence of 
> persistent blank node identifiers in a query with the same 
> identifiers provided as answer bindings in previous queries, and with 
> the corresponding bnodes in the dataset. Servers which do not accept 
> queries containing persistent blank nodes may substitute new blank 
> node IDs for bnodeIDs in answer bindings, so that blank nodes in 
> answer bindings in such cases should be treated as scoped locally to 
> the query result.
> 

text included - will align later

> 2.8.3  Will need to be extended and slightly rewritten to handle 
> persistent bnodes: what it currently says is all about non-persistent 
> bnodes. If we don't go that way, it seems OK as it is.
> 

Added 2.8.4 for told bnodes.

> -------
> 
> OK, I think that is enough to incorporate entailment into the current 
> design. With or without persistent bnodes, the definitions work 
> smoothly with other entailments used instead of simple entailment, 
> though of course the lemmas don't apply in those cases.

Does this include #owlDisjunction?

The pattern solution requires all variables to have definite values.

> 
> Pat
> 
> PS. One other rewording in section 2.7:
> 
> "An application or client receiving the results of a query can tell 
> that two solutions or two variable bindings differ in blank nodes but 
> this information is only scoped to the results as defined in "SPARQL 
> Variable Binding Results XML Format" or the CONSTRUCT result form."
> //
> SIMPLE CASE A:
> "Blank nodes in the results of a query are identical to those 
> occurring in the dataset graphs, but this information cannot be used 
> by an application or client which receives these results, since all 
> blank nodes in subsequent queries are treated as being local to that 
> query. In effect, this means that information about co-occurrences of 
> blank nodes may be treated as scoped to the results as defined in 
> "SPARQL Variable Binding Results XML Format" or the CONSTRUCT result 
> form."
> 
> OR, complex case B with persistent bnodes:
> 
> //"Blank nodes in the results of a query are identical to those 
> occurring in the dataset graphs, and may be re-used in subsequent 
> queries as persistent blank nodes. Servers which accept a query 
> containing a persistent blank node must respect the identity of such 
> blank node identifiers across multiple queries and answer bindings. 
> In contrast, blank nodes which appear in queries as nonpersistent 
> blank nodes are treated as being local to that query, and bear no 
> relationship to occurrences of the same blank node in other queries 
> or their answer bindings."
> 


I don't think it as simple as a per-service feature. A query service may accept:

SELECT *
FROM <foo>
WHERE { ... }

and

SELECT * WHERE { ... }

so the first will not work  for persistent blank nodes (they just will not 
match so there will be no results from a basic pattern involving them) and the 
second can.  That is, the nature of the query matters.

The model I have is that servers must respect the identity of blank node 
identifiers regardless - it just may not have any effect because the service 
relabelled all the bNodes (where is that service description spec when you 
need it).



In the XML result format - it scopes bNodes labels to the document.  This 
needs to be relaxed.  Do we need an attribute to declare bNodes labels are 
graph scoped (taken from the graph - still no guarantee that they are 
persistent)?  Or just weaken the text

The text is:
"""
Note: The blank node label I is scoped to the result set XML document and need 
not have any association to the blank node label for that RDF Term in the 
query graph.
"""

I think this needs to be said in the XML result format, not in the SPARQL-Q 
document.  But something like:

"""
Blank nodes in the results of a query may be identical to the those
occurring in the dataset graphs. Service instances which accept a query
containing a persistent blank node will respect the identity of such
blank node identifiers but do not guarantee retaining identity across multiple 
queries (?? some text like "unless otherwise stated"??)
"""

in other words the blank nodes may be preserved - they may not.  "See your
service instance for details."  A SPARQL query with a told bnode is always 
legal as a SPARQL query - the service may generate QueryRequestRefused like 
any other syntactically legal query it does not want to handle.


An attribute to indicate what the bNode labels mean might help: two settings 
"graph", meaning the bNode labels from the graph are exposed, and "results" 
meaning just the result set document.

<results ordered="false" distinct="false" labelscope="graph">

<results ordered="false" distinct="false" labelscope="results">

	Andy
Received on Thursday, 15 December 2005 13:36:53 UTC