Re: [TF-ENT] redundant blank nodes from Birte Glimm on 2009-10-09 (public-rdf-dawg@w3.org from October to December 2009)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Fri, 9 Oct 2009 19:14:01 +0100
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <492f2b0b0910091114j2df67cf8o8ed30d4dd61cfdde@mail.gmail.com>
2009/10/9 Seaborne, Andy <andy.seaborne@hp.com>:
> Birte,
>
> Could you explain further?  This could be a way forward but I can't see how the existing text works for this.
>
> The only entailment relationship in the matching step is between G and P(BGP) and does not consider SG.
>
> SG is related to G but after that it is just a graph. The way it is related isn't mentioned and does not appear at the point of entailment between G and P(BGP). In the matching step, only the vocabulary of SG is considered, not the way it is e-equivalence to G.  Maybe we can add that in some way.
>
>        Andy


Ok, here is my take on that:

It could be that I misinterpreted whether the pattern instance mapping
maps to the scoping graph or to the active graph. But I think even if
I got that wrong, we can make the definition work.

I believed that entailment is decided w.r.t. scoping graph and in my
mind I imagined that you could, for example, take the active graph,
rename the blank nodes to get SG, then built a partial RDFS closure of
SG with an algorithm like what ter Horst proposed and find solutions
with simple entailment (systems could use different ways of course). I
came to this because of condition 3 of extensuions for BGP matching:
For any scoping graph SG and answer set {P1 ... Pn} for a basic graph
pattern BGP, and where {BGP1 .... BGPn} is a set of basic graph
patterns all equivalent to BGP, none of which share any blank nodes
with any other or with SG
    SG E-entails (SG union P1(BGP1) union ... union Pn(BGPn))

but since the only difference between the active and the scoping graph
are blank node names and they are disjoint it does not really matter
whether I decide entailment w.r.t. G or SG.
Also, in the description of data sets it says:
If DS is the dataset of a query, pattern solutions are therefore
understood to be not from the active graph of DS itself, but from an
RDF graph, called the scoping graph, which is graph-equivalent to the
active graph of DS but shares no blank nodes with DS or with BGP.

"underderstood" is, however, a bit vague.

All in all, this scoping graph business is a bit confusing to me
since, on the other hand, the scoping graph is said to be a purely
theoretical construct and you could also map the variables to the
active graph and use the scoping graph only to rename blank nodes in
the matches that you got from the active graph. Is that how I have to
read it?

I'll try my example again this time assuming I decide entailment
w.r.t. G and show how C1 and C2 need to be rephrased then.
We assumed the active graph G is:
_:b1 :p :z.
_:b2 :q :y.
We assumed the scoping graph SG is:
_:sg1 :p :z.
_:sg2 :q :y.
and the BGP of the query is
?x :p _:n . (I named [] _:n just to have some ID for it. )
If I do not derive the solutions from SG, but from G it would work as follows:
G entails
_:b1_1 :p :z.       for _:b1_1 allocated to _:b1
_:b1_2 :p :z.       for _:b1_2 allocated to _:b1_1
...
_:b1 :p _:z_1.     for _:z_1 allocated to :z
_:b1_1 :p _:z_1.
...
_:b2_1 :q :y.       for _:b2_1 allocated to _:b2
...
same as when using SG but starting with blank nodes _:b1 and _b2
instead of _:sg1 and _:sg2. We get the following pattern instance
mappings for the BGP:
P0: mu: x->_:b1,  sigma: _:n->:z
P1: mu: x->_:b1_1,  sigma: _:n->:z
P2: mu: x->_:b1_2,  sigma: _:n->:z
...
Pn: mu: x->_:b1,  sigma: _:n->_:z_1
Pn+1: mu: x->_:b1_1,  sigma: _:n->_:z_1
...
but not
mu: x->_:b2,  sigma: _:n->:y because the predicate is :q
mu: x->_:b2_1,  sigma: _:n->:y etc

Since blank nodes in SG are different from the ones in G, we have as
required that
SG RDFS-entailes Sg union P0(BGP0) union P1(BGP1) union ...
although strictly speaking the renaming of blank nodes from BGP to
BGP0, BGP1, ... makes the pattern instance mappings possibly invalid,
but we don't really need sigma any more anyway.
We have the possible solutions mu0:x->_:b1, mu1:x->_:b1_1,
mu2:x->_:b1_2, ... for ?x.
Now what we can say is that since G and SG are graph equivalent (only
differing in their blank nodes), there is a bijection M (one-to-one
mapping) from G to SG witnessing the equivalence with M(_:b1)=:_sg1
and M(_:b2)=:_sg2 and we can rephrase C1 and C2 as follows
A possible solution μ is a solution for BGP from G under RDF
entailment if (C1) for each subject s of a triple ( s, p, o ) in
P(BGP), M(s) is defined and M(s) occurs in the scoping graph and (C2)
if μ(v) is a blank node, then M(μ(v)) is defined and M(μ(v)) occurs in
the scoping graph SG.

Now we would get only mu0:x->_:b1 as an answer since M(_:b1)=_:sg1
occurs in the scoping graph, but mu1:x->_:b1_1 is not an answer since
mu1(x) is a blank node, but M is not defined for mu1(x)=_:b1_1as
required by C2. Similarly for all other solution mappings mu2, ...

Now, however, the blank nodes in the answers are not really from SG
and that is why I thought we do the whole mapping on SG (at least
theoretically), but is that how it is intended? I can of course rename
them with M(blank node), but I cannot read that from the spec.

Birte




>
>> -----Original Message-----
>> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
>> On Behalf Of Birte Glimm
>> Sent: 08 October 2009 12:22
>> To: SPARQL Working Group
>> Subject: Re: [TF-ENT] redundant blank nodes
>>
>> Actually, I don't think any more we have a problem here because (x,
>> _:sg2) is not an entailed triple. I should have seen that earlier and
>> apologise for the confusion. I explain below:
>>
>> We assumed the scoping graph SG is:
>> _:sg1 :p :z.
>> _:sg2 :q :y.
>> and the BGP of the query is
>> ?x :p [] .
>> We assumed that (unfortunately) (x, _:sg2) would be an answer, but it
>> is not. Clearly _:sg2 :p [] is a well-formed RDF triple, but it is NOT
>> RDF(S) entailed by SG or G because you cannot just exchange blank
>> nodes. You can introduce fresh blank nodes, but you have to keep track
>> of which of your fresh blank nodes is allocated to which existing
>> node. You can validly derive:
>> _:sg1_1 :p :z.       for _:sg1_1 allocated to _:sg1
>> _:sg1_2 :p :z.       for _:sg1_2 allocated to _:sg1_1
>> _:sg1 :p _:z_1.     for _:z_1 allocated to :z
>> _:sg1_1 :p _:z_1.
>> etc
>> There is no legal way to get
>> _:sg2 :p :z.
>> This would say that :z has both a :p and a :q predecessor and that
>> there exists a node (_:sg2) that has both a :p and a :p sucessor. This
>> does not follow from the given graph. The given graph just says there
>> exists a node that has :z has :p successor (_:sg1) and there exists a
>> node that as :y as :q successor (_:sg2). They could be the same, but
>> they do not have to be the same and entailment considers only what is
>> true in all models.
>> Thus, you cannot arbitrarily use blank nodes from the scoping graph.
>> You can only use them in consequences if they stand for the same node.
>> You can introduce (arbitrarily many) fresh blank nodes, but you still
>> have to keep track of which fresh node represents which original node
>> (called allocated to in the RDF spec). We will not return this fresh
>> blank nodes in the query answers (could be infinitely many), but they
>> can of course be used in the derivation process.
>>
>> Birte
>>
>>
>> 2009/10/7 Birte Glimm <birte.glimm@comlab.ox.ac.uk>:
>> > Hi all,
>> > I am working my way through the open questions/issues, so the next on
>> > my list is something that Andy mentioned. At the moment, we have the 2
>> > conditions C1 and C2 that restrict the set of possible answers to a
>> > finite set of answers for RDF(S) entailment regimes. What these
>> > conditions don't cover is redundant answers that use different blank
>> > nodes. E.g.,
>> > suppose G is
>> > _:b1 :p :z.
>> > _:b2 :q :y.
>> > SG is:
>> > _:sg1 :p :z.
>> > _:sg2 :q :y.
>> > and the BGP of the query is
>> > ?x :p [] .
>> > We would get the two solutions
>> > (x, _:sg1)
>> > (x, _:sg2)
>> > because both _:sg1 :p [] and _:sg2 :p [] are well-formed RDF triples
>> > that are RDF(S) entailed by G (C1) each subject is in the set of terms
>> > used by the scoping graph and (C2) μ(?x) is a blank node occurring in
>> > SG. This always results in a finite answer sequence, but the more
>> > blank nodes we have origianlly, the more redundant answers we get.
>> >
>> > Now what I would rather have only (x, _:sg1) as an answer. This could
>> > be defined by a notion of derivability I think. E.g.,
>> > Let R be a set of entailment rules for the entailment regime E, then,
>> > for each triple (s, p, o) in P(BGP), there must be a derivation of (s,
>> > p, o) from SG by means of R.
>> > Now I could use, for example, the RDFS entailment rules as suggested
>> > by ter Horst, and I get what I want because _:sg2 :p [] has no
>> > derivation.
>> >
>> > What I am quite unhappy about is the use of "a set of entailment
>> > rules" because different systems might want to use different ways of
>> > deriving consequences and this might be too specific. Any opinions on
>> > that? Any better suggestion?
>> >
>> > Birte
>> >
>> >
>> >
>> >
>> > --
>> > Dr. Birte Glimm, Room 306
>> > Computing Laboratory
>> > Parks Road
>> > Oxford
>> > OX1 3QD
>> > United Kingdom
>> > +44 (0)1865 283529
>> >
>>
>>
>>
>> --
>> Dr. Birte Glimm, Room 306
>> Computing Laboratory
>> Parks Road
>> Oxford
>> OX1 3QD
>> United Kingdom
>> +44 (0)1865 283529
>
>



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Friday, 9 October 2009 18:14:34 UTC