Re: bnodeification (was: Re: SPARQL semantics: open issues for basic query patterns) from Pat Hayes on 2006-01-10 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 10 Jan 2006 12:33:49 -0600
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: Souripriya Das <souripriya.das@oracle.com>, RDF Data Access Working Group <public-rdf-dawg@w3.org>, bparsia@isr.umd.edu
Message-Id: <p06230903bfe99a796436@[10.100.0.5]>
>>Quoting from Pat Hayes's e-mail 
>>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0051.html 
>>:
>>
>>"Another way to define S(Q) is to imagine that every bnode in Q is 
>>replaced by a distinct query variable (which cannot be SELECTed, 
>>however) and then use this definition. There are cases in which 
>>this definition is slightly tighter than the previous one, 
>>eliminating some redundancy. Also this doesn't need the 
>>ordered-merge idea."
>>
>>This seems to be a simpler way. Has this been discussed already?
>
>As I (dimly) recall, we did at one point consider not allowing 
>bnodes in queries but this idea was rejected as too restrictive for 
>the query writers (imagine taking some RDF and plopping variables 
>where you want answers, but leaving the other bnodes alone, why 
>not?) However, I think if we had realized the tar-pit we were 
>getting into by allowing bnodes scoped to queries, we might have 
>considered it more actively. It sure does make all the definitions 
>easier (and I think conceptually clearer) if we assume that bnodes 
>in queries are treated as simply a species of variable, as this 
>neatly solves all the awkward 'scoping' problems that arise.

Further to that, here's a modified wording to get the same results as 
Enrico's most recent draft, using this idea. The key is in the 
modified definition of binding. Enrico, would you be happy with this 
as an editorial tweak to your text? It gets exactly the same results, 
so the question is orthogonal to dealing with the rest of the algebra.

-----------

A binding on a pattern Q is a map from the variables and bnodes in Q 
to terms. The result B(Q) of applying the binding B to Q is the RDF 
graph obtained by replacing each variable or bnode in Q by its value 
under B.

An answer binding with respect to a scoping graph G' is a binding B such that:
(1)  any bnodeIDs introduced by B occur in G'
(2)  G entails (G' union B(Q)).

The purpose of the scoping graph is to ensure that bnodeIDs in the 
answer set are used in ways that conform to their usage in G' and in 
other answers, i.e. it defines the scope of answer bnodeIDs.

An answer set for G is the set of all answer bindings w.r.t. a 
scoping graph G' which is graph-equivalent to G.

The use of G' rather than G itself is to allow answers to use 
bnodeIDs which are distinct from those in the dataset.

-----------

This keeps all the test cases the same, and defines the same set of 
answers as your text. (We should probably say explicitly that SELECT 
* does not get you the bnode bindings.) And BTW, should we ever want 
to re-open the told-bnodes issue, the difference between a server 
that handles bnodes as told-bnodes and one that doesn't is that the 
first uses a single scoping graph for multiple queries, while the 
second uses a new scoping graph each time. So, our current decision 
regarding told bnodes amounts to saying that servers can use a 
different scoping graph for each answer document, which is really 
just a fancy way of saying that answer bnodes are scoped to the 
answer document.

That really is union in clause 2, not merge, because we want all the 
bnodes to be in a single scope. Note that this now does not require 
the use of a special ordered merge. Im sure you and Sergio can see 
this immediately, but to see why this works, consider a bnode in Q 
which previously was therefore also the same bnode in any 
binding-instance B(Q). Previously we had to standardize this apart 
from the bnodes in G', [unless it was a told bnode, but ignore that] 
and in other instances of Q, but without standardizing apart any 
bnodes that got introduced by B or other bindings, since those we 
want to share a scope with G': hence the rather peculiar details of 
the merging/substitution operation. Now, a standardizing-apart 
mapping, used in a merge, maps bnodes to new bnodes, but since we are 
going to later require that the result is (at least simply) entailed 
by G' and uses vocabulary from G', this invokes another mapping of 
that new bnode to something in G'. So we were going from a bnode in 
one copy of Q to a new bnode not in G' or any other copy of Q (to 
avoid accidental coincidences) to something in G, via entailment. OK, 
so now B can go directly from the bnode in this particular copy of Q 
to that thing in G. And it works in reverse, obviously, because 
whatever B maps this bnode to, we could have got there by a detour 
through a 'new' bnode in the old scheme. For other answers, we have 
other answer bindings which now take another instance of that bnode 
to something else: we don't need to keep them as bnodes and work hard 
to keep all the versions of them standardized apart from one another 
and from G'.

This in effect treats a bnode in query patterns as being a 'blank 
variable', ie a variable which has no variable name and so cannot be 
SELECTed, which automagically keeps the scope of the bnode to be one 
particular instance of the query pattern. It also IMO is intuitively 
clearer to think of a query bnode as being exactly like a variable. 
Note, they are still bnodes in the syntax, so nothing has actually 
changed. It also makes vividly clear that the choice between a bnode 
and a variable that isn't selected is arbitrary when writing queries.)

Lemmas (all kind of trivial).

1. All answer sets are equivalent, i.e. identical up to bnode 
renaming. (So we can reasonably refer to 'the' answer set, using the 
RDF convention regarding graph equivalence)

2.  If 'entails' is simple entailment, then B is an answer binding 
just when B(Q) is a subgraph of G. (So it 'parameterizes' properly in 
Enrico's sense)

3. If 'entails' is simple entailment, then G simply entails B(Q) if 
and only if there is an answer binding B' with B(Q) an instance of 
B'(Q) (ie, if B' subsumes B).

Pat

>Pat
>
>>
>>Thanks,
>>- Souri.
>>
>>
>>Attachment converted: betelguese2:souripriya.das.vcf (TEXT/Hdra) (00226A23)
>
>
>--
>---------------------------------------------------------------------
>IHMC		(850)434 8903 or (650)494 3973   home
>40 South Alcaniz St.	(850)202 4416   office
>Pensacola			(850)202 4440   fax
>FL 32502			(850)291 0667    cell
>phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 10 January 2006 18:34:09 UTC