Re: On told-bnodes in queries from Pat Hayes on 2005-11-04 (public-rdf-dawg@w3.org from October to December 2005)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 4 Nov 2005 14:40:37 -0600
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230904bf916e68e4fc@[10.100.0.9]>
>On 3 Nov 2005, at 04:45, Pat Hayes wrote:
>>>On the other hand, I don't believe that your alternative semantics 
>>>based on "union" works.
>>>Here is the counterexample - the same as before :-)
>>>
>>>GRAPH: :s :p _:b .
>>>
>>>query:  { ?x :p _:a }
>>>
>>>where we mean _:a in the query to be used as a told bnode.
>>>Well, I expect to get the empty set as the answer, but with your 
>>>union semantics I get [?x/:s].
>>
>>But this case is impossible. The only way such a bnode can occur in 
>>a query is if it were provided as a binding to a variable in a 
>>previous query on the same graph. So the bnode must occur in the 
>>graph somewhere. Then creating the union will cause the bnode in 
>>the (subsequent) query to be identified with its previous 
>>occurrence in the graph.
>
>Aha! But how do you know?

How does who know what?

I know that if we use the 'union' definition, any re-use of the same 
bnodeID will identify the same bnode, simply from the basic RDF 
definitions of graph and bnode. But i don't think that is what you 
meant.

The server, by hypothesis, knows the bnodeIDs for the bnodes in G. 
The client, initially, does not, and has no way to access them. They 
are private to G. Call this the normal state of affairs. There is no 
action which will allow a client to change this normal state of 
affairs unless the server permits it. Most servers will not. They 
will maintain this normal state of affairs by treating bnodes in any 
query as locally scoped to that query, in effect keeping the bnodes 
in G and any bnodes in a query standardized apart. If all queries 
were like this, our task here would be trivial. Under these 
circumstances, and assuming we are using simple entailment, a bnode 
in a query acts just like a query variable, but does not request an 
answer binding: it can be viewed as a throw-away query variable. This 
can be phrased in several ways, all equivalent:

G simply entails Q[S]
G simply entails (G union Q[S])
G simply entails (G merge Q[S])
G simply entails (G merge Q)[S]
an instance of Q[S] is a subgraph of G

However, we also want to allow the possibility of a transaction in 
which the state of affairs is not normal. This can happen only if the 
server permits it and cooperates with the client in a particular way. 
This situation can become non-normal only under one very special 
circumstance: the client poses a query, the server provides one of 
its bnodeIDs as an answer binding, and the client then uses that 
bnodeID in a subsequent query, and the server continues to recognize 
it in this query as the bnodeID of the original bnode in the first 
query. But note that all the decisions here, which make this an 
abnormal state of affairs, are taken by the server, not by the 
client. All the client can do is to pass a query using a bnodeID that 
has been previously supplied to it. If it uses any other bnodeID, 
then the server should interpret that as it would in a normal state 
of affairs, i.e. as locally scoped to the query. (The 'list' examples 
I gave you earlier illustrate why we would want this to be the case.)

Of the above alternatives, the only one which allows this as a 
generalization is

G simply entails (G union Q[S])

which becomes equivalent to the others when Q[S] is 'standardized 
apart' from Q, ie when they share no bnodes; which can therefore be 
considered to the the extra condition which keeps things 'normal' but 
some servers may choose to relax. Again, note that the decision, 
whether or not to standardize Q[S] apart from G, can only rationally 
be taken by the server, since the client has access only to Q, not to 
G or to S.

>How do you bootstrap the process to understand which are the bnodes 
>occurring in a graph? There is no way, because the only way to 
>access the information from a graph is through querying anyway.

Right, exactly. That is why your proposed counterexample is not 
appropriate, since for the client to be able to 'declare' a bnode in 
a query as intended to be in the scope of the graph only makes sense 
if the client has somehow been given access to the names used inside 
that scope: and the only way that can have occurred is by the server 
providing some such names as answer bindings. So for a client to be 
able to declare a bnode as rigid in its first query in a new query 
session, is foolish; it cannot possibly be based on any rational 
expectation, it woudl be like shooting at random into the dark. To 
permit this would be a bad design (and nobody has ever requested or 
even suggested this, AFAIK.)

>The only solution would be to systematically add as a first step of 
>any chain of queries a query of the type { ?x ?y ?z } to fetch all 
>the bnodes names in the graph, and then you can proceed from that 
>list. This is very artificial and really bad.

Agreed, that is hopeless. But it is not the only alternative; see above.

>Moreover, nobody guarantees that in the case of multiple graphs in a 
>dataset the process of graph merging to build the unified dataset 
>will return the same fresh bnode names twice.

Again, that is up to the server. Indeed, a server which is in a 
situation where its bnodeIDs are liable to change, would likely not 
offer the possibility of re-recognizing a bnode from an earlier 
query, and would only process queries in a 'normal state of affairs'. 
As I say, this is likely to be the normal case. Nevertheless, there 
are use cases, typically where the server and client are much 
'closer' than usual, perhaps running on the same platform and with 
very high mutual bandwidth, where the 'abnormal' kind of 
server/client communication is quite feasible and extremely useful: 
so if we can find a way to permit that also, then we will have done a 
better job.

>On the other hand, our approach says:
>
>GRAPH: :s :p _:b .
>
>query 1:  { ?x :p ?{_:a} }
>
>to get the empty answer, as opposed to
>
>query 2:  { ?x :p _:a }
>
>to get the answer [?x/:s].

I know it says that, but I am suggesting that this is not, in fact, 
the behavior we want. It should be impossible for the client to 
compose a query which fixes the reference of a bnode that has not 
been supplied to it by the server in a previous query. Such a query 
makes no practical sense.

Pat


>cheers
>--e.


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 4 November 2005 20:40:47 UTC