- From: Pat Hayes <phayes@ihmc.us>
- Date: Mon, 31 Oct 2005 19:51:16 -0600
- To: Enrico Franconi <franconi@inf.unibz.it>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Enrico, greetings.
I have just noticed an issue in the way that your definitions are
phrased in http://www.inf.unibz.it/krdb/w3c/sparql/ which I think is
a problem, and will need to be resolved.
I am afraid that I mis-read this on earlier passes, so did not notice
the problem until now, forgive me.
It concerns the treatment of bnodes in queries. The key point is
that an answer S is defined (ignoring details) by the condition
G entails (G merge Q[S])
rather than
G entails (G union Q[S])
This means that any bnodes in the query are required to be
standardized apart from bnodes in the graph G, making it impossible
for a query to refer to bnodes in G whose identifiers have been
supplied by a query service as part of an earlier answer, the
condition that I (unofficially) referred to as a 'huddle' in an
earlier message. However, we have had several direct requests from
user community representatives to allow this kind of bnode-sensitive
transaction, and there are clearly use cases for a SAPRQL protocol
which require it, so it is unacceptable to rule it out by definition.
This is a delicate issue, because there are also use cases for which
it clearly would NOT be appropriate to allow queries to refer to, or
be able to access, bnodes in G. We therefore need a way to phrase the
conditions on an answer which can allow both kinds of relationship in
a transaction, and this is indeed tricky. However, your phrasing does
suggest a way to proceed. (In fact I (mis)read it this way the first
few times I read through it, which is why I was so happy to move
forward on this basis.)
Suppose we simply re-phrase the definition of an answer in the way
suggested above, i.e. as
G entails (G union Q[S])
Notice, union rather than merge. This means that the bnodes in Q[S]
are in the same graph, i.e. logical scope, as those in G, so that if
a bnode ID occurs in both G and in Q[S] then they refer to the same
bnode. This provides for bnode-sensitive querying while still using
the language of entailment. (See PS below for a side comment here.)
Now, to allow for the other option, and produce a result equivalent
(I think) to that which your definitions give, we keep this
definition but add the extra condition that the bnodes in Q[S] are
disjoint from those in G. This is then an option which the answering
service can impose, simply by giving answers to queries which use
'new' bnodes, distinct from those in G. A conforming answering
service now has control, in effect, over the scope of the bnodes in
its answers. If it supplies answer bindings to bnode IDs which
identify bnodes in its graph G, then further queries which use those
bnodeIDs will be required to be understood to refer to the same
entities as in previous queries. For example, in the case of an
example like that in my earlier email:
?v ex:p ex:a
against the (lean) graph
{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d}
gives answers v/ex:b and v/_:1, say. Now the further query
_:1 ex:q ?w
gets the single answer w/ex:c, allowing the bnode in the second query
to be used to find out 'more about' the entity denoted by the bnode
(i.e. which the bnode asserts to exist). Note that w/ex:d is not an
answer to this query, since
{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d}
does not entail
{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d
_:1 ex:q ex:d }.
If however the first answer uses 'new' bnodes, distinct from those in
G, then any uses of bnodes in a subsequent query are standardized
apart from those in G, so that the union (G union Q[S]) is in fact a
merge (G merge Q[S]), and then the subsequent queries behave just as
with your definitions. So that in the above case the first query
answers might be v/ex:b and v/_:xxx, and then the query
_:xxx ex:q ?w
elicits the answers w/ex:c and w/ex:d, as one would expect when the
bnode in the query is understood as an existential with that query as
its scope: for
{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d}
entails both
{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d
_:xxx ex:q ex:c}
and
{ex:b ex:p ex:a
_:1 ex:p ex:a
_:1 ex:q ex:c
_:2 ex:q ex:d
_:xxx ex:q ex:d}
But there is now no way to relate these answers to the answers from
the previous query: each answer 'scopes' the bnodes in it to that
query-answer pair, and they are understood to be distinct from any
other bnodes in other query/answer pairs and from bnodes in the graph
itself.
Nevertheless, to emphasize, the definition of 'answer' is the same in
both cases, the SPARQL definitions (if we choose to follow this
definitional route) work identically in both cases, and both involve
the same notion of entailment, and allow other entailments to be used
naturally in future extensions. The difference is essentially a
difference in the naming strategy adopted by the answering service in
the two cases. In the first case, it exposes its actual bnodes to
public view, allowing their identifiers to be used in subsequent
queries: in the second, it hides them and uses different bnodeIDs to
report an existential result. In both cases, the answer is an
assertion of an existential, and the same entailment relations are
used; the difference is one of scope. And of course this contrast is
orthogonal to the kind of entailment involved: the difference would
arise, and the same naming strategy distinction could be used, with
RDF or RDFS or OWL or any other entailment.
---
The obvious objection to this suggestion is that it provides no way
for the query service to communicate to the client which bnode
strategy is in use. There are several ways we might address this that
occur to me. One, the crudest, is to require services to declare
their 'bnode style' as part of the service that is identified by the
all-encompassing service URI. Another is to specify a particular
SPARQL bnodeID style to be used by services when they supply a bnode
binding intended to be re-used, as in the first case above, e.g. say
'_:resource1', '_:resource2', etc.. (Although it seems that it would
be very little extra work to use a URIref in the answer in this case,
there are objections to creating permanent URI names for such an
elusive and fleeting act of reference.) Another would be to use the
opposite convention and require services to supply a distinctive
bnode style - say, a bnode ID beginning '_:XX' , or maybe '_:_' - to
indicate that the bnode cannot be used in subsequent queries to
establish a co-reference, with the understanding that any other
bnodes found in an answer binding can be so used. This has the merit
of allowing transactions between closely cooperating services and
clients, where the service graph is 'visible' to the client, to be
performed without doing any special translation or renaming of nodes
in graphs. No doubt other alternatives are possible; and it is also
possible to simply punt on this issue and allow conforming engines to
be built on either basis, and adopt their own conventions.
Anyway, I will leave this as a comment to you, Enrico, and a request
for guidance from the WG as to how best to proceed.
Pat
PS. As an aside, it seems to me that with your definitions as they
stand, there is no need to use the rather artificial construction
G entails (G merge Q[S])
since this will be true, I believe, just when
G entails Q[S]
Nothing is gained by repeating the graph G in the entailment, unless
the 'conclusion' and the original graph might share a bnode; which is
ruled out by definition when they are combined by merging. So I am
not sure why you used this construction (and am left with a lingering
feeling that I may have missed, or misunderstood, something important
in your document.)
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 cell
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 1 November 2005 01:51:28 UTC