Re: putting entailment into SPARQL from Pat Hayes on 2005-12-16 (public-rdf-dawg@w3.org from October to December 2005)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 16 Dec 2005 15:22:24 -0600
To: andy.seaborne@hp.com
Cc: "Eric Prud'hommeaux" <eric@w3.org>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p0623090ebfc8c956fc1d@[10.100.0.23]>
>Pat Hayes wrote:
>>Andy, I think we are working on different designs for 
>>told/persistent bnodes. I'm not sure that the definitions given 
>>will work for your design, and to be honest I don't understand your 
>>design well enough yet to offer revisions. See in-line comments 
>>below.
>
>Pat,
>
>So let's make sure we are clear - I managed to integrate the 
>non-told bNodes version with (what I considered) just editorial 
>changes to get it to fit.

Agreed, that case is straightforward.

>The issue for told bNodes is around persistence.  I assuming that 
>even if we have told bNodes, then a conformant SPARQL processor is 
>not going to be required to implement this feature.  Your design of 
>generating an error.  That works.  A corner case is where some 
>queries might work and other might not at the same query service.

Yah, my design makes it hard to support that mixed case, though a 
server could offer it as a graph-URI-specific service, I guess.

>
>The term "persistence" comes with a lot of associations and my 
>immediate reaction was "for how long"?  Across server restart?

I agree, this is tricky. This is the best argument against allowing 
told bnodes at all, seems to me: if they are supposed to persist for 
ever, they ought to be IRIs, obviously. I would like us to have a 
notion of a 'session' or immediate local sequence of queries, but I 
don't really see how to scope it.

>  I'm always trying to avoid making the server track client or retain 
>state because it can lead to restrictions on scalility as well as 
>require integration with the protocol, greatly expanding the 
>protocol.

Agreed...

>
>By 'not discuss "persistence"' I meant getting way from that 
>language to avoid needing to deal with the associations, not the 
>concept as applied to bNodes.

....but I don't really see how to avoid the issue, unless we allow 
servers to advertise their willingness to treat told-bnodes correctly 
but then to in fact treat them incorrectly, which seems like a worse 
solution for a standard.

>
>I did knowingly change the defintion chnages you suggested but I did 
>start on defining the rename function.  I'm not sure where this 
>comes from - the client or the server.

The server is allowed to rename bnodes in queries unless they are 
told bnodes. So, in effect, the server is permitted to treat bnodes 
in queries as being scoped locally to the query, even if they happen 
to use bnodeIDs in its own graph, unless they are told bnodes. Those, 
it has to treat the bnodeIDs as real. So if they happen to be bnodes 
in its own graph, it has to respect that bnode identity.

>Is the rename derived from the query syntax by having different 
>syntax forms for  told and non-told bNodes?

Seems to me we must have some way to distinguish them in a query, 
yes. I'm not sure how. One way (?) might be to say that ALL bnodes in 
a query are told bnodes, and so to get the effect of a regular bnode 
you use a query variable instead. Then we wouldn't need to make the 
explicit distinction, and wouldn't need to use that M mapping in the 
definition.

Pat


>
>	Andy
>
>>
>>Pat
>>
>>
>>>Pat, Bijan, Enrico,
>>>
>>>For #owlDisjunction, we need to handle distinguished and 
>>>nondistinguished variables differently.  My understanding of teh 
>>>design below is that this happens when nondistinguished variables 
>>>are bNodes - and, for SPARQL, any named variable is distinguished.
>>>
>>>v1.581 (and later) for changes (ongoing)
>>>
>>>
>>>Pat Hayes wrote:
>>>
>>>>Reading through http://www.w3.org/2001/sw/DataAccess/rq23/
>>>>
>>>
>>><editorial comments done>
>>>
>>>>2.4 Definition: Pattern Solution
>>>>There is a real problem in defining all this for triples first 
>>>>and then for graphs, as the triple-only definitions don't 
>>>>generalize properly. For example, if S1 is a solution for {T1} 
>>>>and S2 for {T2}, they might both be wrong for {T1, T2} if the 
>>>>triples share a bnode. Its not enough for each triple to match, 
>>>>contra to what it says in section 2.5.
>>>>
>>>>I don't know how to state the conditions for the single-triple 
>>>>case in terms of entailment.
>>>>
>>>>I suggest we first define simple graph pattern and then go into 
>>>>matching, so that all the matching definitions apply to at least 
>>>>simple graphs, with a triple considered to be a singleton graph 
>>>>pattern.
>>>
>>>Agreed - we have to now define matching for basic patterns and not 
>>>for triple patterns.
>>>
>>>>If we did this, the definitions would be like this:
>>>>
>>>>Basic Graph Pattern.
>>>>... is a set of triple patterns.
>>>>
>>>>Pattern Solution.
>>>>A variable substitution is a substitution function on a subset of 
>>>>V to RDF-T. A pattern solution on the pattern V to the dataset G 
>>>>is a variable substitution whose domain includes all the 
>>>>variables in V, whose range is a subset of the set of RDF terms 
>>>>occurring in G, and which matches the dataset DS.
>>>
>>>?? "on a pattern P"
>>
>>
>>Yes, P not V. Sorry.
>>
>>>
>>>>(Note, we don't now count bnodes in the range of a substitution, 
>>>>since that aspect of 'matching' will be handled by the simple 
>>>>entailment. Also, notice that the restriction on the range of a 
>>>>solution is essential to prevent bnode-leakage in answers.)
>>>>
>>>>(Doesn't the notion of matching need to refer to SM and R in some 
>>>>cases, as well as the dataset??)
>>>
>>>Just the dataset - this is pattern matching, and the solution 
>>>modifers are applied afterwards.
>>
>>
>>Yes, but this definition refers ahead to other kinds of matching, 
>>was my point.
>
>Noted - there is a forward reference.  When we have stability, I 
>plan do a whole-document scan.  Maybe the RDF dataset does need to 
>be defined as an initial definition.
>
>>
>>
>>>  Result forms are applied after solution modifiers.  Need to add 
>>>something about which graph in the dataset is being matched (that 
>>>was already on the ToDo list).
>>>
>>>>2.5 [simple version A, assuming no told bnodes]
>>>
>>>I'm trying for a "told bNodes" solution and keeping this 
>>>non-told-bnode text in reserve.
>>
>>
>>OK... but...
>
>Both versions are in rq23 now.
>
>>
>>
>>>>Basic Graph Matching.
>>>>
>>>>A basic graph pattern P matches on graph G with solution S when S 
>>>>is a pattern solution on P to G such that G simply entails S(P).
>>>>
>>>>(Optional Lemma. S is a pattern solution on b.g.p. P to G just 
>>>>when S can be extended to a mapping S' from (V union {bnodes in 
>>>>P}) to RDF-T's  s.t. S'(P) is a subgraph of G. [Proof, see 
>>>>interpolation lemma. QED.].
>>>>This connects it with the current defs and gives a more directly 
>>>>implementable account of matching.)
>>>>
>>>>(Optional lemma 2. There is a pattern solution on b.g.p. P to G 
>>>>just when G simply entails P', where P' is the RDF graph obtained
>>>
>>>>from P by replacing each variable by a new blank node not in P or
>>>
>>>>G. [Proof, obvious consequence of first lemma. QED.]
>>>>Which shows that variables are just bnodes with a special kind of 
>>>>label, in a sense, and that in this case bnodes in pattern and 
>>>>bnodes in graph might as well be in different graphs.)
>>>>
>>>>"For a basic graph pattern to match some dataset, there must be a 
>>>>solution where each of the triple patterns matches the dataset 
>>>>with that solution."
>>>
>>>Removed
>>>
>>>>Suggest delete the above sentence, it is potentially misleading. 
>>>>Or else add "; but the reverse is not always true."
>>>>
>>>>"This query contains a basic graph pattern of two triple 
>>>>patterns, each of which must match for the graph pattern to 
>>>>match."
>>>>//
>>>>"This query contains a basic graph pattern of two triple 
>>>>patterns, each of which must match, with the same substitution, 
>>>>for the graph pattern to match."
>>>
>>>Done.
>>>
>>>>OR:
>>>>2.5 [complex version B, allowing for persistent bnodes between 
>>>>queries. Let me assume that the notion of "persistent bnodes in a 
>>>>graph pattern" makes sense somehow, exactly how will require some 
>>>>further work.]
>>>
>>>Suggestion: we do not discuss "persistence" of bnodes and allow 
>>>the query to contain told bnodes with no assumptions or guarantees 
>>>across queries.
>>
>>
>>... ?? Wha?? Then there is no point in allowing told bnodes, seems 
>>to me. The only use case involves re-using a bnode supplied as an 
>>answer to one query in a later query. Seems to me this option gets 
>>us the worst of both worlds.
>
>My concern is that "persistent" places a specific responsibility on 
>all implementations to expose stable bNode labels.
>
>>
>>
>>>SELECT *
>>>FROM <foo>
>>>WHERE { ... }
>>>
>>>is not going to be guaranteed to work for bNodes across queries. 
>>>Reading <foo> each time is a reasonable implementation and all the 
>>>bNodes may get relabelled.  Or reading <foo> just uses the 
>>>document-scoped labels in the file and a sequence of queries will 
>>>work.  Or it may be already read and cached.
>>
>>
>>I don't follow this, sorry. Im focussing on the definitions, not 
>>the machinery. But how can a sequence of queries work on a single 
>>document-scoped bnode if the bnodes aren't persistent across 
>>queries?
>
>Document-scoped is a minimum - it can be achived by global scoping 
>everything or from a system that reads things once and remembers 
>them read.
>
>It's the converse I'm looking at - what are the requirments on any 
>SPARQL implementation.
>
>One way round this is to have a service either offer persistent 
>bNodes or not and not cover a service that does somethimes.
>
>>
>>
>>>>A basic graph pattern P matches on graph G with solution S when S 
>>>>is a pattern solution on P to G such that G simply entails (G 
>>>>union S(M(P))), where M is a 1:1 mapping on blank nodes which 
>>>>replaces each non-persistent blank node in P with a new blank 
>>>>node which does not occur in P or G.
>>>>
>>>>(It might be a good idea to define that notion of the M mapping 
>>>>ahead of time in an auxiliary definition of its own. We could 
>>>>call it a persistent merge or a partial merge, or some such name.)
>>>
>>>Didn't understand the use of the term "merge" for M - I called it 
>>>a renaming, mapping sets of triples to sets of triple patterns.
>>
>>
>>OK, but its a very special kind of re-naming, might be best to use 
>>a modifier, eg partial renaming.
>
>Will do that.
>
>>
>>
>>>v1.581
>>>
>>>>
>>>>2.6 last para mentions "simple, conjunctive" but conjunctive 
>>>>hasn't been defined yet.
>>>
>>>Changed to
>>>"""This is a basic graph pattern match, ..."
>>>
>>>>2.7 needs to be rewritten if we go with the persistent-bnode 
>>>>option. Even if we don't, I think we should remove "A blank node 
>>>>may match any RDF term." since that isn't officially called 
>>>>'matching' now we are talking about entailment.
>>>
>>>We need to fix on the terminolgy.
>>
>>
>>Agreed. I modified 'told' to 'persistent' as being more 
>>self-explanatory, but it was only a suggestion.
>>
>>>Is "told bnode" universal, is "persistent bnode", or "literal" or 
>>>"constant" Bnode (as in a prgamming language literal or constant.
>>>
>>>I've picked "told bNode" as a placeholder (not consistently yet) - 
>>>suggestions very welcome - need a pair for told/not-told & 
>>>persistentent/non-persistent.
>>>
>>>>For complex version B, it might read something like this:
>>>>
>>>>2.7.1
>>>>A blank node can occur in a query pattern in two ways. A 
>>>>persistent blank node acts like a 'temporary IRI': it is assumed 
>>>>to share the same scope as the dataset graph and all previous 
>>>>queries. Persistent blank nodes are intended to be used to allow 
>>>>a query to re-use a blank node provided as an answer binding from 
>>>>an earlier query, to obtain more information about the entity 
>>>>referred to by the blank node. A similar functionality may be 
>>>>obtained by the SPARQL server supplying an IRI in the answer 
>>>>binding for the query in such a case, even though the dataset 
>>>>contains only a blank node; this corresponds to 'skolemizing' the 
>>>>existential assertion in the dataset. All other blank nodes are 
>>>>referred to as non-persistent blank nodes. Non-persistent blank 
>>>>nodes in a query are scoped to the query graph pattern, and 
>>>>cannot be identified with bnodes in the dataset or in previous 
>>>>queries.
>>>>
>>>>SPARQL servers may refuse to accept queries containing persistent 
>>>>blank nodes, and may treat them as ill-formed queries. Servers 
>>>>which do accept persistent blank nodes must recognize the 
>>>>co-occurrence of persistent blank node identifiers in a query 
>>>>with the same identifiers provided as answer bindings in previous 
>>>>queries, and with the corresponding bnodes in the dataset. 
>>>>Servers which do not accept queries containing persistent blank 
>>>>nodes may substitute new blank node IDs for bnodeIDs in answer 
>>>>bindings, so that blank nodes in answer bindings in such cases 
>>>>should be treated as scoped locally to the query result.
>>>>
>>>
>>>text included - will align later
>>>
>>>>2.8.3  Will need to be extended and slightly rewritten to handle 
>>>>persistent bnodes: what it currently says is all about 
>>>>non-persistent bnodes. If we don't go that way, it seems OK as it 
>>>>is.
>>>>
>>>
>>>Added 2.8.4 for told bnodes.
>>>
>>>>-------
>>>>
>>>>OK, I think that is enough to incorporate entailment into the 
>>>>current design. With or without persistent bnodes, the 
>>>>definitions work smoothly with other entailments used instead of 
>>>>simple entailment, though of course the lemmas don't apply in 
>>>>those cases.
>>>
>>>Does this include #owlDisjunction?
>>
>>
>>Well, if you say OWL entailment then it does, yes. But my point was 
>>that the told-bnode trickery works for all the different 
>>entailments: it doesn't break when you use a stronger entailment.
>>
>>>The pattern solution requires all variables to have definite values.
>>>
>>>>Pat
>>>>
>>>>PS. One other rewording in section 2.7:
>>>>
>>>>"An application or client receiving the results of a query can 
>>>>tell that two solutions or two variable bindings differ in blank 
>>>>nodes but this information is only scoped to the results as 
>>>>defined in "SPARQL Variable Binding Results XML Format" or the 
>>>>CONSTRUCT result form."
>>>>//
>>>>SIMPLE CASE A:
>>>>"Blank nodes in the results of a query are identical to those 
>>>>occurring in the dataset graphs, but this information cannot be 
>>>>used by an application or client which receives these results, 
>>>>since all blank nodes in subsequent queries are treated as being 
>>>>local to that query. In effect, this means that information about 
>>>>co-occurrences of blank nodes may be treated as scoped to the 
>>>>results as defined in "SPARQL Variable Binding Results XML 
>>>>Format" or the CONSTRUCT result form."
>>>>
>>>>OR, complex case B with persistent bnodes:
>>>>
>>>>//"Blank nodes in the results of a query are identical to those 
>>>>occurring in the dataset graphs, and may be re-used in subsequent 
>>>>queries as persistent blank nodes. Servers which accept a query 
>>>>containing a persistent blank node must respect the identity of 
>>>>such blank node identifiers across multiple queries and answer 
>>>>bindings. In contrast, blank nodes which appear in queries as 
>>>>nonpersistent blank nodes are treated as being local to that 
>>>>query, and bear no relationship to occurrences of the same blank 
>>>>node in other queries or their answer bindings."
>>>>
>>>
>>>
>>>I don't think it as simple as a per-service feature. A query 
>>>service may accept:
>>>
>>>SELECT *
>>
>>>FROM <foo>
>>
>>>WHERE { ... }
>>>
>>>and
>>>
>>>SELECT * WHERE { ... }
>>>
>>>so the first will not work  for persistent blank nodes (they just 
>>>will not match
>>
>>
>>Why not? I don't follow this.
>
>The second case has some fixed dataset behind it.
>
>>
>>
>>>so there will be no results from a basic pattern involving them) 
>>>and the second can.  That is, the nature of the query matters.
>>
>>
>>I don't understand this, I confess. What is the difference between 
>>the queries here, and why does it matter to how told bnodes work?
>>
>>>The model I have is that servers must respect the identity of 
>>>blank node identifiers regardless - it just may not have any 
>>>effect because the service relabelled all the bNodes
>>
>>
>>No, my point was that the server has 2 options. It can just refuse 
>>to handle told bnodes, and reject queries with them in. OR, it can 
>>accept them: but then it is obliged to treat them correctly and 
>>keep track of them between queries. If it is willing to accept told 
>>bnodes in query patterns then its not allowed to rename bnodes 
>>between queries. It can't accept them but then give wrong answers.
>>
>>>(where is that service description spec when you need it).
>>>
>>>
>>>
>>>In the XML result format - it scopes bNodes labels to the 
>>>document. This needs to be relaxed.
>>
>>
>>Yes, I missed that. Do we need to even say what the scope is in the XML?
>>
>>>Do we need an attribute to declare bNodes labels are graph scoped 
>>>(taken from the graph - still no guarantee that they are 
>>>persistent)?  Or just weaken the text
>>>
>>>The text is:
>>>"""
>>>Note: The blank node label I is scoped to the result set XML 
>>>document and need not have any association to the blank node label 
>>>for that RDF Term in the query graph.
>>>"""
>>>
>>>I think this needs to be said in the XML result format, not in the 
>>>SPARQL-Q document.  But something like:
>>>
>>>"""
>>>Blank nodes in the results of a query may be identical to the those
>>>occurring in the dataset graphs. Service instances which accept a query
>>>containing a persistent blank node will respect the identity of such
>>>blank node identifiers but do not guarantee retaining identity 
>>>across multiple queries (?? some text like "unless otherwise 
>>>stated"??)
>>>"""
>>>
>>>in other words the blank nodes may be preserved - they may not.  "See your
>>>service instance for details."  A SPARQL query with a told bnode 
>>>is always legal as a SPARQL query - the service may generate 
>>>QueryRequestRefused like any other syntactically legal query it 
>>>does not want to handle.
>>
>>
>>OK, but it should not be allowed to accept the request and then 
>>give the wrong answer for it. If it accepts them, it must handle 
>>told bnode scopes properly.
>>
>>>An attribute to indicate what the bNode labels mean might help: 
>>>two settings "graph", meaning the bNode labels from the graph are 
>>>exposed, and "results" meaning just the result set document.
>>>
>>><results ordered="false" distinct="false" labelscope="graph">
>>>
>>><results ordered="false" distinct="false" labelscope="results">
>>
>>
>>Except that should be in the query. Its the query that has the 
>>initiative to insist that a told bnode has the same scope as the 
>>document, so is persistent through mutiple queries. The only things 
>>the service can do is to refuse to play ball with such a query, or 
>>handle it correctly.
>>
>>Pat
>>
>>>	Andy


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 16 December 2005 21:22:39 UTC