- From: Pat Hayes <phayes@ihmc.us>
- Date: Mon, 31 Oct 2005 19:51:16 -0600
- To: Enrico Franconi <franconi@inf.unibz.it>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Enrico, greetings. I have just noticed an issue in the way that your definitions are phrased in http://www.inf.unibz.it/krdb/w3c/sparql/ which I think is a problem, and will need to be resolved. I am afraid that I mis-read this on earlier passes, so did not notice the problem until now, forgive me. It concerns the treatment of bnodes in queries. The key point is that an answer S is defined (ignoring details) by the condition G entails (G merge Q[S]) rather than G entails (G union Q[S]) This means that any bnodes in the query are required to be standardized apart from bnodes in the graph G, making it impossible for a query to refer to bnodes in G whose identifiers have been supplied by a query service as part of an earlier answer, the condition that I (unofficially) referred to as a 'huddle' in an earlier message. However, we have had several direct requests from user community representatives to allow this kind of bnode-sensitive transaction, and there are clearly use cases for a SAPRQL protocol which require it, so it is unacceptable to rule it out by definition. This is a delicate issue, because there are also use cases for which it clearly would NOT be appropriate to allow queries to refer to, or be able to access, bnodes in G. We therefore need a way to phrase the conditions on an answer which can allow both kinds of relationship in a transaction, and this is indeed tricky. However, your phrasing does suggest a way to proceed. (In fact I (mis)read it this way the first few times I read through it, which is why I was so happy to move forward on this basis.) Suppose we simply re-phrase the definition of an answer in the way suggested above, i.e. as G entails (G union Q[S]) Notice, union rather than merge. This means that the bnodes in Q[S] are in the same graph, i.e. logical scope, as those in G, so that if a bnode ID occurs in both G and in Q[S] then they refer to the same bnode. This provides for bnode-sensitive querying while still using the language of entailment. (See PS below for a side comment here.) Now, to allow for the other option, and produce a result equivalent (I think) to that which your definitions give, we keep this definition but add the extra condition that the bnodes in Q[S] are disjoint from those in G. This is then an option which the answering service can impose, simply by giving answers to queries which use 'new' bnodes, distinct from those in G. A conforming answering service now has control, in effect, over the scope of the bnodes in its answers. If it supplies answer bindings to bnode IDs which identify bnodes in its graph G, then further queries which use those bnodeIDs will be required to be understood to refer to the same entities as in previous queries. For example, in the case of an example like that in my earlier email: ?v ex:p ex:a against the (lean) graph {ex:b ex:p ex:a _:1 ex:p ex:a _:1 ex:q ex:c _:2 ex:q ex:d} gives answers v/ex:b and v/_:1, say. Now the further query _:1 ex:q ?w gets the single answer w/ex:c, allowing the bnode in the second query to be used to find out 'more about' the entity denoted by the bnode (i.e. which the bnode asserts to exist). Note that w/ex:d is not an answer to this query, since {ex:b ex:p ex:a _:1 ex:p ex:a _:1 ex:q ex:c _:2 ex:q ex:d} does not entail {ex:b ex:p ex:a _:1 ex:p ex:a _:1 ex:q ex:c _:2 ex:q ex:d _:1 ex:q ex:d }. If however the first answer uses 'new' bnodes, distinct from those in G, then any uses of bnodes in a subsequent query are standardized apart from those in G, so that the union (G union Q[S]) is in fact a merge (G merge Q[S]), and then the subsequent queries behave just as with your definitions. So that in the above case the first query answers might be v/ex:b and v/_:xxx, and then the query _:xxx ex:q ?w elicits the answers w/ex:c and w/ex:d, as one would expect when the bnode in the query is understood as an existential with that query as its scope: for {ex:b ex:p ex:a _:1 ex:p ex:a _:1 ex:q ex:c _:2 ex:q ex:d} entails both {ex:b ex:p ex:a _:1 ex:p ex:a _:1 ex:q ex:c _:2 ex:q ex:d _:xxx ex:q ex:c} and {ex:b ex:p ex:a _:1 ex:p ex:a _:1 ex:q ex:c _:2 ex:q ex:d _:xxx ex:q ex:d} But there is now no way to relate these answers to the answers from the previous query: each answer 'scopes' the bnodes in it to that query-answer pair, and they are understood to be distinct from any other bnodes in other query/answer pairs and from bnodes in the graph itself. Nevertheless, to emphasize, the definition of 'answer' is the same in both cases, the SPARQL definitions (if we choose to follow this definitional route) work identically in both cases, and both involve the same notion of entailment, and allow other entailments to be used naturally in future extensions. The difference is essentially a difference in the naming strategy adopted by the answering service in the two cases. In the first case, it exposes its actual bnodes to public view, allowing their identifiers to be used in subsequent queries: in the second, it hides them and uses different bnodeIDs to report an existential result. In both cases, the answer is an assertion of an existential, and the same entailment relations are used; the difference is one of scope. And of course this contrast is orthogonal to the kind of entailment involved: the difference would arise, and the same naming strategy distinction could be used, with RDF or RDFS or OWL or any other entailment. --- The obvious objection to this suggestion is that it provides no way for the query service to communicate to the client which bnode strategy is in use. There are several ways we might address this that occur to me. One, the crudest, is to require services to declare their 'bnode style' as part of the service that is identified by the all-encompassing service URI. Another is to specify a particular SPARQL bnodeID style to be used by services when they supply a bnode binding intended to be re-used, as in the first case above, e.g. say '_:resource1', '_:resource2', etc.. (Although it seems that it would be very little extra work to use a URIref in the answer in this case, there are objections to creating permanent URI names for such an elusive and fleeting act of reference.) Another would be to use the opposite convention and require services to supply a distinctive bnode style - say, a bnode ID beginning '_:XX' , or maybe '_:_' - to indicate that the bnode cannot be used in subsequent queries to establish a co-reference, with the understanding that any other bnodes found in an answer binding can be so used. This has the merit of allowing transactions between closely cooperating services and clients, where the service graph is 'visible' to the client, to be performed without doing any special translation or renaming of nodes in graphs. No doubt other alternatives are possible; and it is also possible to simply punt on this issue and allow conforming engines to be built on either basis, and adopt their own conventions. Anyway, I will leave this as a comment to you, Enrico, and a request for guidance from the WG as to how best to proceed. Pat PS. As an aside, it seems to me that with your definitions as they stand, there is no need to use the rather artificial construction G entails (G merge Q[S]) since this will be true, I believe, just when G entails Q[S] Nothing is gained by repeating the graph G in the entailment, unless the 'conclusion' and the original graph might share a bnode; which is ruled out by definition when they are combined by merging. So I am not sure why you used this construction (and am left with a lingering feeling that I may have missed, or misunderstood, something important in your document.) -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 1 November 2005 01:51:28 UTC