Re: SPARQL Semantics document from Pat Hayes on 2005-11-01 (public-rdf-dawg@w3.org from October to December 2005)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 31 Oct 2005 19:51:16 -0600
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230912bf8c4d62f33c@[10.100.0.16]>
Enrico, greetings.

I have just noticed an issue in the way that your definitions are 
phrased in http://www.inf.unibz.it/krdb/w3c/sparql/  which I think is 
a problem, and will need to be resolved.

I am afraid that I mis-read this on earlier passes, so did not notice 
the problem until now, forgive me.

It concerns the treatment of bnodes in queries.  The key point is 
that an answer S is defined (ignoring details) by the condition

G entails (G merge Q[S])

rather than

G entails (G union Q[S])

This means that any bnodes in the query are required to be 
standardized apart from bnodes in the graph G, making it impossible 
for a query to refer to bnodes in G whose identifiers have been 
supplied by a query service as part of an earlier answer, the 
condition that I (unofficially) referred to as a 'huddle' in an 
earlier message. However, we have had several direct requests from 
user community representatives to allow this kind of bnode-sensitive 
transaction, and there are clearly use cases for a SAPRQL protocol 
which require it, so it is unacceptable to rule it out by definition.

This is a delicate issue, because there are also use cases for which 
it clearly would NOT be appropriate to allow queries to refer to, or 
be able to access, bnodes in G. We therefore need a way to phrase the 
conditions on an answer which can allow both kinds of relationship in 
a transaction, and this is indeed tricky. However, your phrasing does 
suggest a way to proceed. (In fact I (mis)read it this way the first 
few times I read through it, which is why I was so happy to move 
forward on this basis.)

Suppose we simply re-phrase the definition of an answer in the way 
suggested above, i.e. as

G entails (G union Q[S])

Notice, union rather than merge. This means that the bnodes in Q[S] 
are in the same graph, i.e. logical scope, as those in G, so that if 
a bnode ID occurs in both G and in Q[S] then they refer to the same 
bnode. This provides for bnode-sensitive querying while still using 
the language of entailment. (See PS below for a side comment here.)

Now, to allow for the other option, and produce a result equivalent 
(I think) to that which your definitions give, we keep this 
definition but add the extra condition that the bnodes in Q[S] are 
disjoint from those in G. This is then an option which the answering 
service can impose, simply by giving answers to queries which use 
'new' bnodes, distinct from those in G. A conforming answering 
service now has control, in effect, over the scope of the bnodes in 
its answers. If it supplies answer bindings to bnode IDs which 
identify bnodes in its graph G, then further queries which use those 
bnodeIDs will be required to be understood to refer to the same 
entities as in previous queries. For example, in the case of an 
example like that in my earlier email:

?v ex:p ex:a

against the (lean) graph

{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d}

gives answers  v/ex:b and  v/_:1, say. Now the further query

_:1 ex:q ?w

gets the single answer w/ex:c, allowing the bnode in the second query 
to be used to find out 'more about' the entity denoted by the bnode 
(i.e. which the bnode asserts to exist). Note that w/ex:d is not an 
answer to this query, since

{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d}

does not entail

{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d
_:1 ex:q ex:d }.

If however the first answer uses 'new' bnodes, distinct from those in 
G, then any uses of bnodes in a subsequent query are standardized 
apart from those in G, so that the union (G union Q[S]) is in fact a 
merge (G merge Q[S]), and then the subsequent queries behave just as 
with your definitions. So that in the above case the first query 
answers might be v/ex:b and v/_:xxx, and then the query

_:xxx ex:q ?w

elicits the answers w/ex:c and w/ex:d, as one would expect when the 
bnode in the query is understood as an existential with that query as 
its scope: for

{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d}

entails both

{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d
_:xxx ex:q ex:c}

and

{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d
_:xxx ex:q ex:d}

But there is now no way to relate these answers to the answers from 
the previous query: each answer 'scopes' the bnodes in it to that 
query-answer pair, and they are understood to be distinct from any 
other bnodes in other query/answer pairs and from bnodes in the graph 
itself.

Nevertheless, to emphasize, the definition of 'answer' is the same in 
both cases, the SPARQL definitions (if we choose to follow this 
definitional route) work identically in both cases, and both involve 
the same notion of entailment, and allow other entailments to be used 
naturally in future extensions. The difference is essentially a 
difference in the naming strategy adopted by the answering service in 
the two cases. In the first case, it exposes its actual bnodes to 
public view, allowing their identifiers to be used in subsequent 
queries: in the second, it hides them and uses different bnodeIDs to 
report an existential result. In both cases, the answer is an 
assertion of an existential, and the same entailment relations are 
used; the difference is one of scope. And of course this contrast is 
orthogonal to the kind of entailment involved: the difference would 
arise, and the same naming strategy distinction could be used, with 
RDF or RDFS or OWL or any other entailment.

---

The obvious objection to this suggestion is that it provides no way 
for the query service to communicate to the client which bnode 
strategy is in use. There are several ways we might address this that 
occur to me. One, the crudest, is to require services to declare 
their 'bnode style' as part of the service that is identified by the 
all-encompassing service URI. Another is to specify a particular 
SPARQL bnodeID style to be used by services when they supply a bnode 
binding intended to be re-used, as in the first case above, e.g. say
'_:resource1', '_:resource2', etc.. (Although it seems that it would 
be very little extra work to use a URIref in the answer in this case, 
there are objections to creating permanent URI names for such an 
elusive and fleeting act of reference.) Another would be to use the 
opposite convention and require services to supply a distinctive 
bnode style - say, a bnode ID beginning '_:XX' , or maybe '_:_' -  to 
indicate that the bnode cannot be used in subsequent queries to 
establish a co-reference, with the understanding that any other 
bnodes found in an answer binding can be so used. This has the merit 
of allowing transactions between closely cooperating services and 
clients, where the service graph is 'visible' to the client, to be 
performed without doing any special translation or renaming of nodes 
in graphs. No doubt other alternatives are possible; and it is also 
possible to simply punt on this issue and allow conforming engines to 
be built on either basis, and adopt their own conventions.

Anyway, I will leave this as a comment to you, Enrico, and a request 
for guidance from the WG as to how best to proceed.

Pat

PS. As an aside, it seems to me that with your definitions as they 
stand, there is no need to use the rather artificial construction

G entails (G merge Q[S])

since this will be true, I believe, just when

G entails Q[S]

Nothing is gained by repeating the graph G in the entailment, unless 
the 'conclusion' and the original graph might share a bnode; which is 
ruled out by definition when they are combined by merging. So I am 
not sure why you used this construction (and am left with a lingering 
feeling that I may have missed, or misunderstood, something important 
in your document.)

-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 1 November 2005 01:51:28 UTC