Re: putting entailment into SPARQL from Pat Hayes on 2005-12-16 (public-rdf-dawg@w3.org from October to December 2005)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 16 Dec 2005 10:41:48 -0600
To: andy.seaborne@hp.com
Cc: "Eric Prud'hommeaux" <eric@w3.org>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230901bfc899a5ce86@[10.100.0.23]>
Andy, I think we are working on different designs for told/persistent 
bnodes. I'm not sure that the definitions given will work for your 
design, and to be honest I don't understand your design well enough 
yet to offer revisions. See in-line comments below.

Pat


>Pat, Bijan, Enrico,
>
>For #owlDisjunction, we need to handle distinguished and 
>nondistinguished variables differently.  My understanding of teh 
>design below is that this happens when nondistinguished variables 
>are bNodes - and, for SPARQL, any named variable is distinguished.
>
>v1.581 (and later) for changes (ongoing)
>
>
>Pat Hayes wrote:
>>Reading through http://www.w3.org/2001/sw/DataAccess/rq23/
>>
>
><editorial comments done>
>
>>
>>2.4 Definition: Pattern Solution
>>There is a real problem in defining all this for triples first and 
>>then for graphs, as the triple-only definitions don't generalize 
>>properly. For example, if S1 is a solution for {T1} and S2 for 
>>{T2}, they might both be wrong for {T1, T2} if the triples share a 
>>bnode. Its not enough for each triple to match, contra to what it 
>>says in section 2.5.
>>
>>I don't know how to state the conditions for the single-triple case 
>>in terms of entailment.
>>
>>I suggest we first define simple graph pattern and then go into 
>>matching, so that all the matching definitions apply to at least 
>>simple graphs, with a triple considered to be a singleton graph 
>>pattern.
>
>Agreed - we have to now define matching for basic patterns and not 
>for triple patterns.
>
>>
>>If we did this, the definitions would be like this:
>>
>>Basic Graph Pattern.
>>... is a set of triple patterns.
>>
>>Pattern Solution.
>>A variable substitution is a substitution function on a subset of V 
>>to RDF-T. A pattern solution on the pattern V to the dataset G is a 
>>variable substitution whose domain includes all the variables in V, 
>>whose range is a subset of the set of RDF terms occurring in G, and 
>>which matches the dataset DS.
>
>?? "on a pattern P"

Yes, P not V. Sorry.

>
>
>>(Note, we don't now count bnodes in the range of a substitution, 
>>since that aspect of 'matching' will be handled by the simple 
>>entailment. Also, notice that the restriction on the range of a 
>>solution is essential to prevent bnode-leakage in answers.)
>>
>>(Doesn't the notion of matching need to refer to SM and R in some 
>>cases, as well as the dataset??)
>
>Just the dataset - this is pattern matching, and the solution 
>modifers are applied afterwards.

Yes, but this definition refers ahead to other kinds of matching, was my point.

>   Result forms are applied after solution modifiers.  Need to add 
>something about which graph in the dataset is being matched (that 
>was already on the ToDo list).
>
>>
>>2.5 [simple version A, assuming no told bnodes]
>
>I'm trying for a "told bNodes" solution and keeping this 
>non-told-bnode text in reserve.

OK... but...

>
>>Basic Graph Matching.
>>
>>A basic graph pattern P matches on graph G with solution S when S 
>>is a pattern solution on P to G such that G simply entails S(P).
>>
>>(Optional Lemma. S is a pattern solution on b.g.p. P to G just when 
>>S can be extended to a mapping S' from (V union {bnodes in P}) to 
>>RDF-T's  s.t. S'(P) is a subgraph of G. [Proof, see interpolation 
>>lemma. QED.].
>>This connects it with the current defs and gives a more directly 
>>implementable account of matching.)
>>
>>(Optional lemma 2. There is a pattern solution on b.g.p. P to G 
>>just when G simply entails P', where P' is the RDF graph obtained 
>>from P by replacing each variable by a new blank node not in P or 
>>G. [Proof, obvious consequence of first lemma. QED.]
>>Which shows that variables are just bnodes with a special kind of 
>>label, in a sense, and that in this case bnodes in pattern and 
>>bnodes in graph might as well be in different graphs.)
>>
>>"For a basic graph pattern to match some dataset, there must be a 
>>solution where each of the triple patterns matches the dataset with 
>>that solution."
>
>Removed
>
>>
>>Suggest delete the above sentence, it is potentially misleading. Or 
>>else add "; but the reverse is not always true."
>>
>>"This query contains a basic graph pattern of two triple patterns, 
>>each of which must match for the graph pattern to match."
>>//
>>"This query contains a basic graph pattern of two triple patterns, 
>>each of which must match, with the same substitution, for the graph 
>>pattern to match."
>
>Done.
>
>>
>>OR:
>>2.5 [complex version B, allowing for persistent bnodes between 
>>queries. Let me assume that the notion of "persistent bnodes in a 
>>graph pattern" makes sense somehow, exactly how will require some 
>>further work.]
>
>Suggestion: we do not discuss "persistence" of bnodes and allow the 
>query to contain told bnodes with no assumptions or guarantees 
>across queries.

... ?? Wha?? Then there is no point in allowing told bnodes, seems to 
me. The only use case involves re-using a bnode supplied as an answer 
to one query in a later query. Seems to me this option gets us the 
worst of both worlds.

>
>SELECT *
>FROM <foo>
>WHERE { ... }
>
>is not going to be guaranteed to work for bNodes across queries. 
>Reading <foo> each time is a reasonable implementation and all the 
>bNodes may get relabelled.  Or reading <foo> just uses the 
>document-scoped labels in the file and a sequence of queries will 
>work.  Or it may be already read and cached.

I don't follow this, sorry. Im focussing on the definitions, not the 
machinery. But how can a sequence of queries work on a single 
document-scoped bnode if the bnodes aren't persistent across queries?

>>A basic graph pattern P matches on graph G with solution S when S 
>>is a pattern solution on P to G such that G simply entails (G union 
>>S(M(P))), where M is a 1:1 mapping on blank nodes which replaces 
>>each non-persistent blank node in P with a new blank node which 
>>does not occur in P or G.
>>
>>(It might be a good idea to define that notion of the M mapping 
>>ahead of time in an auxiliary definition of its own. We could call 
>>it a persistent merge or a partial merge, or some such name.)
>
>Didn't understand the use of the term "merge" for M - I called it a 
>renaming, mapping sets of triples to sets of triple patterns.

OK, but its a very special kind of re-naming, might be best to use a 
modifier, eg partial renaming.

>
>v1.581
>
>>
>>
>>2.6 last para mentions "simple, conjunctive" but conjunctive hasn't 
>>been defined yet.
>
>Changed to
>"""This is a basic graph pattern match, ..."
>
>>
>>2.7 needs to be rewritten if we go with the persistent-bnode 
>>option. Even if we don't, I think we should remove "A blank node 
>>may match any RDF term." since that isn't officially called 
>>'matching' now we are talking about entailment.
>
>We need to fix on the terminolgy.

Agreed. I modified 'told' to 'persistent' as being more 
self-explanatory, but it was only a suggestion.

>  Is "told bnode" universal, is "persistent bnode", or "literal" or 
>"constant" Bnode (as in a prgamming language literal or constant.
>
>I've picked "told bNode" as a placeholder (not consistently yet) - 
>suggestions very welcome - need a pair for told/not-told & 
>persistentent/non-persistent.
>
>>
>>For complex version B, it might read something like this:
>>
>>2.7.1
>>A blank node can occur in a query pattern in two ways. A persistent 
>>blank node acts like a 'temporary IRI': it is assumed to share the 
>>same scope as the dataset graph and all previous queries. 
>>Persistent blank nodes are intended to be used to allow a query to 
>>re-use a blank node provided as an answer binding from an earlier 
>>query, to obtain more information about the entity referred to by 
>>the blank node. A similar functionality may be obtained by the 
>>SPARQL server supplying an IRI in the answer binding for the query 
>>in such a case, even though the dataset contains only a blank node; 
>>this corresponds to 'skolemizing' the existential assertion in the 
>>dataset. All other blank nodes are referred to as non-persistent 
>>blank nodes. Non-persistent blank nodes in a query are scoped to 
>>the query graph pattern, and cannot be identified with bnodes in 
>>the dataset or in previous queries.
>>
>>SPARQL servers may refuse to accept queries containing persistent 
>>blank nodes, and may treat them as ill-formed queries. Servers 
>>which do accept persistent blank nodes must recognize the 
>>co-occurrence of persistent blank node identifiers in a query with 
>>the same identifiers provided as answer bindings in previous 
>>queries, and with the corresponding bnodes in the dataset. Servers 
>>which do not accept queries containing persistent blank nodes may 
>>substitute new blank node IDs for bnodeIDs in answer bindings, so 
>>that blank nodes in answer bindings in such cases should be treated 
>>as scoped locally to the query result.
>>
>
>text included - will align later
>
>>2.8.3  Will need to be extended and slightly rewritten to handle 
>>persistent bnodes: what it currently says is all about 
>>non-persistent bnodes. If we don't go that way, it seems OK as it 
>>is.
>>
>
>Added 2.8.4 for told bnodes.
>
>>-------
>>
>>OK, I think that is enough to incorporate entailment into the 
>>current design. With or without persistent bnodes, the definitions 
>>work smoothly with other entailments used instead of simple 
>>entailment, though of course the lemmas don't apply in those cases.
>
>Does this include #owlDisjunction?

Well, if you say OWL entailment then it does, yes. But my point was 
that the told-bnode trickery works for all the different entailments: 
it doesn't break when you use a stronger entailment.

>
>The pattern solution requires all variables to have definite values.
>
>>
>>Pat
>>
>>PS. One other rewording in section 2.7:
>>
>>"An application or client receiving the results of a query can tell 
>>that two solutions or two variable bindings differ in blank nodes 
>>but this information is only scoped to the results as defined in 
>>"SPARQL Variable Binding Results XML Format" or the CONSTRUCT 
>>result form."
>>//
>>SIMPLE CASE A:
>>"Blank nodes in the results of a query are identical to those 
>>occurring in the dataset graphs, but this information cannot be 
>>used by an application or client which receives these results, 
>>since all blank nodes in subsequent queries are treated as being 
>>local to that query. In effect, this means that information about 
>>co-occurrences of blank nodes may be treated as scoped to the 
>>results as defined in "SPARQL Variable Binding Results XML Format" 
>>or the CONSTRUCT result form."
>>
>>OR, complex case B with persistent bnodes:
>>
>>//"Blank nodes in the results of a query are identical to those 
>>occurring in the dataset graphs, and may be re-used in subsequent 
>>queries as persistent blank nodes. Servers which accept a query 
>>containing a persistent blank node must respect the identity of 
>>such blank node identifiers across multiple queries and answer 
>>bindings. In contrast, blank nodes which appear in queries as 
>>nonpersistent blank nodes are treated as being local to that query, 
>>and bear no relationship to occurrences of the same blank node in 
>>other queries or their answer bindings."
>>
>
>
>I don't think it as simple as a per-service feature. A query service 
>may accept:
>
>SELECT *
>FROM <foo>
>WHERE { ... }
>
>and
>
>SELECT * WHERE { ... }
>
>so the first will not work  for persistent blank nodes (they just 
>will not match

Why not? I don't follow this.

>so there will be no results from a basic pattern involving them) and 
>the second can.  That is, the nature of the query matters.

I don't understand this, I confess. What is the difference between 
the queries here, and why does it matter to how told bnodes work?

>
>The model I have is that servers must respect the identity of blank 
>node identifiers regardless - it just may not have any effect 
>because the service relabelled all the bNodes

No, my point was that the server has 2 options. It can just refuse to 
handle told bnodes, and reject queries with them in. OR, it can 
accept them: but then it is obliged to treat them correctly and keep 
track of them between queries. If it is willing to accept told bnodes 
in query patterns then its not allowed to rename bnodes between 
queries. It can't accept them but then give wrong answers.

>  (where is that service description spec when you need it).
>
>
>
>In the XML result format - it scopes bNodes labels to the document. 
>This needs to be relaxed.

Yes, I missed that. Do we need to even say what the scope is in the XML?

>  Do we need an attribute to declare bNodes labels are graph scoped 
>(taken from the graph - still no guarantee that they are 
>persistent)?  Or just weaken the text
>
>The text is:
>"""
>Note: The blank node label I is scoped to the result set XML 
>document and need not have any association to the blank node label 
>for that RDF Term in the query graph.
>"""
>
>I think this needs to be said in the XML result format, not in the 
>SPARQL-Q document.  But something like:
>
>"""
>Blank nodes in the results of a query may be identical to the those
>occurring in the dataset graphs. Service instances which accept a query
>containing a persistent blank node will respect the identity of such
>blank node identifiers but do not guarantee retaining identity 
>across multiple queries (?? some text like "unless otherwise 
>stated"??)
>"""
>
>in other words the blank nodes may be preserved - they may not.  "See your
>service instance for details."  A SPARQL query with a told bnode is 
>always legal as a SPARQL query - the service may generate 
>QueryRequestRefused like any other syntactically legal query it does 
>not want to handle.

OK, but it should not be allowed to accept the request and then give 
the wrong answer for it. If it accepts them, it must handle told 
bnode scopes properly.

>An attribute to indicate what the bNode labels mean might help: two 
>settings "graph", meaning the bNode labels from the graph are 
>exposed, and "results" meaning just the result set document.
>
><results ordered="false" distinct="false" labelscope="graph">
>
><results ordered="false" distinct="false" labelscope="results">

Except that should be in the query. Its the query that has the 
initiative to insist that a told bnode has the same scope as the 
document, so is persistent through mutiple queries. The only things 
the service can do is to refuse to play ball with such a query, or 
handle it correctly.

Pat

>
>	Andy


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 16 December 2005 16:41:59 UTC