Re: putting entailment into SPARQL

Pat Hayes wrote:
> Andy, I think we are working on different designs for told/persistent 
> bnodes. I'm not sure that the definitions given will work for your 
> design, and to be honest I don't understand your design well enough 
> yet to offer revisions. See in-line comments below.

Pat,

So let's make sure we are clear - I managed to integrate the non-told bNodes 
version with (what I considered) just editorial changes to get it to fit.

The issue for told bNodes is around persistence.  I assuming that even if we 
have told bNodes, then a conformant SPARQL processor is not going to be 
required to implement this feature.  Your design of generating an error.  That 
works.  A corner case is where some queries might work and other might not at 
the same query service.

The term "persistence" comes with a lot of associations and my immediate 
reaction was "for how long"?  Across server restart?  I'm always trying to 
avoid making the server track client or retain state because it can lead to 
restrictions on scalility as well as require integration with the protocol, 
greatly expanding the protocol.

By 'not discuss "persistence"' I meant getting way from that language to avoid 
needing to deal with the associations, not the concept as applied to bNodes.

I did knowingly change the defintion chnages you suggested but I did start on 
defining the rename function.  I'm not sure where this comes from - the client 
or the server. Is the rename derived from the query syntax by having different 
syntax forms for  told and non-told bNodes?

	Andy

> 
> Pat
> 
> 
> 
>>Pat, Bijan, Enrico,
>>
>>For #owlDisjunction, we need to handle distinguished and 
>>nondistinguished variables differently.  My understanding of teh 
>>design below is that this happens when nondistinguished variables 
>>are bNodes - and, for SPARQL, any named variable is distinguished.
>>
>>v1.581 (and later) for changes (ongoing)
>>
>>
>>Pat Hayes wrote:
>>
>>>Reading through http://www.w3.org/2001/sw/DataAccess/rq23/
>>>
>>
>><editorial comments done>
>>
>>>2.4 Definition: Pattern Solution
>>>There is a real problem in defining all this for triples first and 
>>>then for graphs, as the triple-only definitions don't generalize 
>>>properly. For example, if S1 is a solution for {T1} and S2 for 
>>>{T2}, they might both be wrong for {T1, T2} if the triples share a 
>>>bnode. Its not enough for each triple to match, contra to what it 
>>>says in section 2.5.
>>>
>>>I don't know how to state the conditions for the single-triple case 
>>>in terms of entailment.
>>>
>>>I suggest we first define simple graph pattern and then go into 
>>>matching, so that all the matching definitions apply to at least 
>>>simple graphs, with a triple considered to be a singleton graph 
>>>pattern.
>>
>>Agreed - we have to now define matching for basic patterns and not 
>>for triple patterns.
>>
>>
>>>If we did this, the definitions would be like this:
>>>
>>>Basic Graph Pattern.
>>>... is a set of triple patterns.
>>>
>>>Pattern Solution.
>>>A variable substitution is a substitution function on a subset of V 
>>>to RDF-T. A pattern solution on the pattern V to the dataset G is a 
>>>variable substitution whose domain includes all the variables in V, 
>>>whose range is a subset of the set of RDF terms occurring in G, and 
>>>which matches the dataset DS.
>>
>>?? "on a pattern P"
> 
> 
> Yes, P not V. Sorry.
> 
> 
>>
>>>(Note, we don't now count bnodes in the range of a substitution, 
>>>since that aspect of 'matching' will be handled by the simple 
>>>entailment. Also, notice that the restriction on the range of a 
>>>solution is essential to prevent bnode-leakage in answers.)
>>>
>>>(Doesn't the notion of matching need to refer to SM and R in some 
>>>cases, as well as the dataset??)
>>
>>Just the dataset - this is pattern matching, and the solution 
>>modifers are applied afterwards.
> 
> 
> Yes, but this definition refers ahead to other kinds of matching, was my point.

Noted - there is a forward reference.  When we have stability, I plan do a 
whole-document scan.  Maybe the RDF dataset does need to be defined as an 
initial definition.

> 
> 
>>  Result forms are applied after solution modifiers.  Need to add 
>>something about which graph in the dataset is being matched (that 
>>was already on the ToDo list).
>>
>>
>>>2.5 [simple version A, assuming no told bnodes]
>>
>>I'm trying for a "told bNodes" solution and keeping this 
>>non-told-bnode text in reserve.
> 
> 
> OK... but...

Both versions are in rq23 now.

> 
> 
>>>Basic Graph Matching.
>>>
>>>A basic graph pattern P matches on graph G with solution S when S 
>>>is a pattern solution on P to G such that G simply entails S(P).
>>>
>>>(Optional Lemma. S is a pattern solution on b.g.p. P to G just when 
>>>S can be extended to a mapping S' from (V union {bnodes in P}) to 
>>>RDF-T's  s.t. S'(P) is a subgraph of G. [Proof, see interpolation 
>>>lemma. QED.].
>>>This connects it with the current defs and gives a more directly 
>>>implementable account of matching.)
>>>
>>>(Optional lemma 2. There is a pattern solution on b.g.p. P to G 
>>>just when G simply entails P', where P' is the RDF graph obtained 
>>
>>>from P by replacing each variable by a new blank node not in P or 
>>
>>>G. [Proof, obvious consequence of first lemma. QED.]
>>>Which shows that variables are just bnodes with a special kind of 
>>>label, in a sense, and that in this case bnodes in pattern and 
>>>bnodes in graph might as well be in different graphs.)
>>>
>>>"For a basic graph pattern to match some dataset, there must be a 
>>>solution where each of the triple patterns matches the dataset with 
>>>that solution."
>>
>>Removed
>>
>>
>>>Suggest delete the above sentence, it is potentially misleading. Or 
>>>else add "; but the reverse is not always true."
>>>
>>>"This query contains a basic graph pattern of two triple patterns, 
>>>each of which must match for the graph pattern to match."
>>>//
>>>"This query contains a basic graph pattern of two triple patterns, 
>>>each of which must match, with the same substitution, for the graph 
>>>pattern to match."
>>
>>Done.
>>
>>
>>>OR:
>>>2.5 [complex version B, allowing for persistent bnodes between 
>>>queries. Let me assume that the notion of "persistent bnodes in a 
>>>graph pattern" makes sense somehow, exactly how will require some 
>>>further work.]
>>
>>Suggestion: we do not discuss "persistence" of bnodes and allow the 
>>query to contain told bnodes with no assumptions or guarantees 
>>across queries.
> 
> 
> ... ?? Wha?? Then there is no point in allowing told bnodes, seems to 
> me. The only use case involves re-using a bnode supplied as an answer 
> to one query in a later query. Seems to me this option gets us the 
> worst of both worlds.

My concern is that "persistent" places a specific responsibility on all 
implementations to expose stable bNode labels.

> 
> 
>>SELECT *
>>FROM <foo>
>>WHERE { ... }
>>
>>is not going to be guaranteed to work for bNodes across queries. 
>>Reading <foo> each time is a reasonable implementation and all the 
>>bNodes may get relabelled.  Or reading <foo> just uses the 
>>document-scoped labels in the file and a sequence of queries will 
>>work.  Or it may be already read and cached.
> 
> 
> I don't follow this, sorry. Im focussing on the definitions, not the 
> machinery. But how can a sequence of queries work on a single 
> document-scoped bnode if the bnodes aren't persistent across queries?

Document-scoped is a minimum - it can be achived by global scoping everything 
or from a system that reads things once and remembers them read.

It's the converse I'm looking at - what are the requirments on any SPARQL 
implementation.

One way round this is to have a service either offer persistent bNodes or not 
and not cover a service that does somethimes.

> 
> 
>>>A basic graph pattern P matches on graph G with solution S when S 
>>>is a pattern solution on P to G such that G simply entails (G union 
>>>S(M(P))), where M is a 1:1 mapping on blank nodes which replaces 
>>>each non-persistent blank node in P with a new blank node which 
>>>does not occur in P or G.
>>>
>>>(It might be a good idea to define that notion of the M mapping 
>>>ahead of time in an auxiliary definition of its own. We could call 
>>>it a persistent merge or a partial merge, or some such name.)
>>
>>Didn't understand the use of the term "merge" for M - I called it a 
>>renaming, mapping sets of triples to sets of triple patterns.
> 
> 
> OK, but its a very special kind of re-naming, might be best to use a 
> modifier, eg partial renaming.

Will do that.

> 
> 
>>v1.581
>>
>>
>>>
>>>2.6 last para mentions "simple, conjunctive" but conjunctive hasn't 
>>>been defined yet.
>>
>>Changed to
>>"""This is a basic graph pattern match, ..."
>>
>>
>>>2.7 needs to be rewritten if we go with the persistent-bnode 
>>>option. Even if we don't, I think we should remove "A blank node 
>>>may match any RDF term." since that isn't officially called 
>>>'matching' now we are talking about entailment.
>>
>>We need to fix on the terminolgy.
> 
> 
> Agreed. I modified 'told' to 'persistent' as being more 
> self-explanatory, but it was only a suggestion.
> 
> 
>> Is "told bnode" universal, is "persistent bnode", or "literal" or 
>>"constant" Bnode (as in a prgamming language literal or constant.
>>
>>I've picked "told bNode" as a placeholder (not consistently yet) - 
>>suggestions very welcome - need a pair for told/not-told & 
>>persistentent/non-persistent.
>>
>>
>>>For complex version B, it might read something like this:
>>>
>>>2.7.1
>>>A blank node can occur in a query pattern in two ways. A persistent 
>>>blank node acts like a 'temporary IRI': it is assumed to share the 
>>>same scope as the dataset graph and all previous queries. 
>>>Persistent blank nodes are intended to be used to allow a query to 
>>>re-use a blank node provided as an answer binding from an earlier 
>>>query, to obtain more information about the entity referred to by 
>>>the blank node. A similar functionality may be obtained by the 
>>>SPARQL server supplying an IRI in the answer binding for the query 
>>>in such a case, even though the dataset contains only a blank node; 
>>>this corresponds to 'skolemizing' the existential assertion in the 
>>>dataset. All other blank nodes are referred to as non-persistent 
>>>blank nodes. Non-persistent blank nodes in a query are scoped to 
>>>the query graph pattern, and cannot be identified with bnodes in 
>>>the dataset or in previous queries.
>>>
>>>SPARQL servers may refuse to accept queries containing persistent 
>>>blank nodes, and may treat them as ill-formed queries. Servers 
>>>which do accept persistent blank nodes must recognize the 
>>>co-occurrence of persistent blank node identifiers in a query with 
>>>the same identifiers provided as answer bindings in previous 
>>>queries, and with the corresponding bnodes in the dataset. Servers 
>>>which do not accept queries containing persistent blank nodes may 
>>>substitute new blank node IDs for bnodeIDs in answer bindings, so 
>>>that blank nodes in answer bindings in such cases should be treated 
>>>as scoped locally to the query result.
>>>
>>
>>text included - will align later
>>
>>
>>>2.8.3  Will need to be extended and slightly rewritten to handle 
>>>persistent bnodes: what it currently says is all about 
>>>non-persistent bnodes. If we don't go that way, it seems OK as it 
>>>is.
>>>
>>
>>Added 2.8.4 for told bnodes.
>>
>>
>>>-------
>>>
>>>OK, I think that is enough to incorporate entailment into the 
>>>current design. With or without persistent bnodes, the definitions 
>>>work smoothly with other entailments used instead of simple 
>>>entailment, though of course the lemmas don't apply in those cases.
>>
>>Does this include #owlDisjunction?
> 
> 
> Well, if you say OWL entailment then it does, yes. But my point was 
> that the told-bnode trickery works for all the different entailments: 
> it doesn't break when you use a stronger entailment.
> 
> 
>>The pattern solution requires all variables to have definite values.
>>
>>
>>>Pat
>>>
>>>PS. One other rewording in section 2.7:
>>>
>>>"An application or client receiving the results of a query can tell 
>>>that two solutions or two variable bindings differ in blank nodes 
>>>but this information is only scoped to the results as defined in 
>>>"SPARQL Variable Binding Results XML Format" or the CONSTRUCT 
>>>result form."
>>>//
>>>SIMPLE CASE A:
>>>"Blank nodes in the results of a query are identical to those 
>>>occurring in the dataset graphs, but this information cannot be 
>>>used by an application or client which receives these results, 
>>>since all blank nodes in subsequent queries are treated as being 
>>>local to that query. In effect, this means that information about 
>>>co-occurrences of blank nodes may be treated as scoped to the 
>>>results as defined in "SPARQL Variable Binding Results XML Format" 
>>>or the CONSTRUCT result form."
>>>
>>>OR, complex case B with persistent bnodes:
>>>
>>>//"Blank nodes in the results of a query are identical to those 
>>>occurring in the dataset graphs, and may be re-used in subsequent 
>>>queries as persistent blank nodes. Servers which accept a query 
>>>containing a persistent blank node must respect the identity of 
>>>such blank node identifiers across multiple queries and answer 
>>>bindings. In contrast, blank nodes which appear in queries as 
>>>nonpersistent blank nodes are treated as being local to that query, 
>>>and bear no relationship to occurrences of the same blank node in 
>>>other queries or their answer bindings."
>>>
>>
>>
>>I don't think it as simple as a per-service feature. A query service 
>>may accept:
>>
>>SELECT *
> 
>>FROM <foo>
> 
>>WHERE { ... }
>>
>>and
>>
>>SELECT * WHERE { ... }
>>
>>so the first will not work  for persistent blank nodes (they just 
>>will not match
> 
> 
> Why not? I don't follow this.

The second case has some fixed dataset behind it.

> 
> 
>>so there will be no results from a basic pattern involving them) and 
>>the second can.  That is, the nature of the query matters.
> 
> 
> I don't understand this, I confess. What is the difference between 
> the queries here, and why does it matter to how told bnodes work?
> 
> 
>>The model I have is that servers must respect the identity of blank 
>>node identifiers regardless - it just may not have any effect 
>>because the service relabelled all the bNodes
> 
> 
> No, my point was that the server has 2 options. It can just refuse to 
> handle told bnodes, and reject queries with them in. OR, it can 
> accept them: but then it is obliged to treat them correctly and keep 
> track of them between queries. If it is willing to accept told bnodes 
> in query patterns then its not allowed to rename bnodes between 
> queries. It can't accept them but then give wrong answers.
> 
> 
>> (where is that service description spec when you need it).
>>
>>
>>
>>In the XML result format - it scopes bNodes labels to the document. 
>>This needs to be relaxed.
> 
> 
> Yes, I missed that. Do we need to even say what the scope is in the XML?
> 
> 
>> Do we need an attribute to declare bNodes labels are graph scoped 
>>(taken from the graph - still no guarantee that they are 
>>persistent)?  Or just weaken the text
>>
>>The text is:
>>"""
>>Note: The blank node label I is scoped to the result set XML 
>>document and need not have any association to the blank node label 
>>for that RDF Term in the query graph.
>>"""
>>
>>I think this needs to be said in the XML result format, not in the 
>>SPARQL-Q document.  But something like:
>>
>>"""
>>Blank nodes in the results of a query may be identical to the those
>>occurring in the dataset graphs. Service instances which accept a query
>>containing a persistent blank node will respect the identity of such
>>blank node identifiers but do not guarantee retaining identity 
>>across multiple queries (?? some text like "unless otherwise 
>>stated"??)
>>"""
>>
>>in other words the blank nodes may be preserved - they may not.  "See your
>>service instance for details."  A SPARQL query with a told bnode is 
>>always legal as a SPARQL query - the service may generate 
>>QueryRequestRefused like any other syntactically legal query it does 
>>not want to handle.
> 
> 
> OK, but it should not be allowed to accept the request and then give 
> the wrong answer for it. If it accepts them, it must handle told 
> bnode scopes properly.
> 
> 
>>An attribute to indicate what the bNode labels mean might help: two 
>>settings "graph", meaning the bNode labels from the graph are 
>>exposed, and "results" meaning just the result set document.
>>
>><results ordered="false" distinct="false" labelscope="graph">
>>
>><results ordered="false" distinct="false" labelscope="results">
> 
> 
> Except that should be in the query. Its the query that has the 
> initiative to insist that a told bnode has the same scope as the 
> document, so is persistent through mutiple queries. The only things 
> the service can do is to refuse to play ball with such a query, or 
> handle it correctly.
> 
> Pat
> 
> 
>>	Andy
> 
> 
> 

Received on Friday, 16 December 2005 18:02:44 UTC