Re: agenda: RDF Data Access 06 Dec from Pat Hayes on 2005-12-06 (public-rdf-dawg@w3.org from October to December 2005)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 5 Dec 2005 21:03:54 -0600
To: Eric Prud'hommeaux <eric@w3.org>
Message-Id: <p06230901bfba9a62d5da@[192.168.2.2]>
>1. Convene, take roll, review records and agenda

Regrets for todays telecon: we have government visitors who will be 
taking my full attention all of today.

>
>2. Query Semantics
>
>ACTION: PatH to sketch tests that characterize impact of semantics work
>
>progress?
>

OK, heres my current understanding of the situation.

Definition.

B is an answer binding for a simple query Q on the graph G iff:

(1) every identifier (URIref, literal or bnode) bound to a variable 
by B actually occurs in G, and

(2) G simply entails (G union B(M(Q)) )

where we assume that bnodes in Q are partitioned into disjoint 
categories called told bnodes and plain bnodes, and M is a 1:1 
mapping on plain bnodes  which replaces each plain bnode by a new 
bnode which is not in G. That is, M does a merging kind of bnodeID 
separation on plain bnodes in the query, but it leaves other bnodes - 
told bnodes in the query, and bnodes supplied by the answer binding 
itself - alone.

There are several other ways to define the same effect, eg in terms 
of skolemizing then de-skolemizing, etc., but this is the simplest 
and most direct I can think of.

It is important to say B(M(Q)) not M(B(Q)) as the latter allows M to 
change bnodes introduced by the answer binding itself, which might 
screw things up: observation made by Enrico :-). An alternative way 
of saying it, suggested by Enrico, is to extend the idea of graph 
merging to query patterns as well as graphs, and say that
G simply entails B(G merge Q),
since this amounts to the same as
G simply entails B(G union M(Q))
for a suitable M applied to the bnodes in the query pattern, ie
G simply entails (G union B(M(Q)))
since there are no variables in G for B to make any changes to.
Note, if all bnodes are plain, then we might as well not bother to 
keep the bnodes in the answer binding the same as those in the graph, 
so we could just allow M to change bnodes in bindings as well, and 
then we could just write
G simply entails (G union M(B(Q)))
ie
G simply entails (G merge B(Q))
ie
G simply entails B(Q) )

Intuitively, a told bnode is one which is understood as common to the 
graph and the query. Using a told bnode allows a query to refer to 
bnodes supplied as answer bindings in earlier queries. Plain bnodes 
are standardized apart from bnodes in the graph, so are treated as 
being in a distinct scope even if they happen to use the same bnodeID 
in the query as a bnodeID in the graph.

Note, this told-bnode trick works ONLY if we allow the bnodes in the 
answer bindings to actually be the bnodes in the graph. If there are 
no told bnodes in subsequent queries, the fact that the 'same' bnode 
is used in a later query is harmless and has no traction, since it is 
standardized apart from the graph (by M) when that query is 
considered. But (contrary to the text from section 2.7 that Peter 
objected to:"An application or client receiving the
	results of a query can tell that two solutions or two variable bindings
	differ in blank nodes but this information is only scoped to the
	results ..")
what we have to say is that the unscoping is done at query input 
time, not at answer delivery time.

If all bnodes are plain, then as noted (2) is equivalent to:

(2')  G simply entails B(Q)

We could adopt this option and simply abandon the idea of told bnodes 
in queries, giving a simpler definition. I would prefer to allow the 
told-bnode trick, myself, as it seems clearly useful and has little 
cost; and I am confident that the above definitions are robust, in 
spite of their awkwardness, and will work OK with other kinds of 
entailment. I think some such trick will be essential in practice for 
any query language that has to deal with bnode-like constructions, 
and that if we put it into SPARQL, we will be cited, long after 
SPARQL itself has fallen into disuse, as the pioneers of the 
re-useable local identifier method in query language design.

-------
Referring to the objection and example in 
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Sep/0036

ISSUE 1.
The text cited (From RDF Concepts) defines a notion of 'graph 
equivalence' which amounts to re-naming the bnodeIDs in a graph. 
Graphs which are equivalent in this sense are isomorphic: this is a 
stronger (more restrictive) sense of equivalence than logical 
equivalence (i.e. entails and entailed by). For example, a non-lean 
graph and its lean reduction are logically equivalent but not graph 
equivalent. The discussion seems to confuse these two senses of 
'equivalence', and it is therefore not entirely clear what point is 
being made.

The above definitions will yield 'equivalent' answers from 
'equivalent' graphs in both sense of 'equivalence'. If the graphs are 
graph equivalent then the answers will be identical up to a 1:1 
mapping on the bnodeIDs in the answer bindings. If the graphs are 
logically equivalent then the sets of answers (strictly, the graphs 
formed by taking the unions of all instances of the query under the 
answer bindings) will be logically equivalent; any redundancy in a 
non-lean graph which is relevant to the query may produce a 
corresponding redundancy in the answer bindings, as in the example 
given:

Q: ?x ex:b ex:c .

G1 : ex:a ex:b ex:c .
_:a ex:b ex:c .

answers: ?x/ex:a  ; ?x/_:a

G2: ex:a ex:b ex:c .

answer: ?x/ex:a

The answer sets are not identical because the graphs are not graph 
equivalent, in the sense cited. The graphs obtained by applying the 
answer bindings to the query are (in this case) identical in each 
case to the original graphs, which are logically equivalent, so the 
answers are logically equivalent. All this seems to me to be exactly 
as it should be, and fully in accord with the RDF semantics. Perhaps 
Peter could clarify in what way he feels this is inappropriate.

----

As for the rest of his comments, I largely agree :-) His issue 4 is 
what the told/plain distinction is trying to resolve.

Pat




-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 13 December 2005 14:42:17 UTC