RE: [TF-ENT] redundant blank nodes from Seaborne, Andy on 2009-10-12 (public-rdf-dawg@w3.org from October to December 2009)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 12 Oct 2009 10:55:06 +0000
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <B6CF1054FDC8B845BF93A6645D19BEA3693F7D3BF6@GVW1118EXC.americas.hpqcorp.net>


> -----Original Message-----
> From: b.glimm@googlemail.com [mailto:b.glimm@googlemail.com] On Behalf Of
> Birte Glimm
> Sent: 09 October 2009 19:14
> To: Seaborne, Andy
> Cc: SPARQL Working Group
> Subject: Re: [TF-ENT] redundant blank nodes
> 
> 2009/10/9 Seaborne, Andy <andy.seaborne@hp.com>:
> > Birte,
> >
> > Could you explain further?  This could be a way forward but I can't see how
> the existing text works for this.
> >
> > The only entailment relationship in the matching step is between G and
> P(BGP) and does not consider SG.
> >
> > SG is related to G but after that it is just a graph. The way it is related
> isn't mentioned and does not appear at the point of entailment between G and
> P(BGP). In the matching step, only the vocabulary of SG is considered, not
> the way it is e-equivalence to G.  Maybe we can add that in some way.
> >
> >        Andy
> 
> 
> Ok, here is my take on that:
> 
> It could be that I misinterpreted whether the pattern instance mapping
> maps to the scoping graph or to the active graph. But I think even if
> I got that wrong, we can make the definition work.
> 
> I believed that entailment is decided w.r.t. scoping graph and in my
> mind I imagined that you could, for example, take the active graph,
> rename the blank nodes to get SG, then built a partial RDFS closure of
> SG with an algorithm like what ter Horst proposed and find solutions
> with simple entailment (systems could use different ways of course). I
> came to this because of condition 3 of extensuions for BGP matching:
> For any scoping graph SG and answer set {P1 ... Pn} for a basic graph
> pattern BGP, and where {BGP1 .... BGPn} is a set of basic graph
> patterns all equivalent to BGP, none of which share any blank nodes
> with any other or with SG
>     SG E-entails (SG union P1(BGP1) union ... union Pn(BGPn))
> 
> but since the only difference between the active and the scoping graph
> are blank node names and they are disjoint it does not really matter
> whether I decide entailment w.r.t. G or SG.

As a way forward, a notional partial RDFS closure is worth exploring.  (I have a reservation because it does not work for OWL-DL - a single approach would be better but maybe we need to give up on that.)

Can we identify which rules are needed for the 'partial'?  it's just not use the simple entailment rules (se1, se2, lg, gl) and do a subgraph test of the logical closure as simple entailment matching happens?

This also means it's clear that the shape of SG is matters, not just it's term set (signature).

> Also, in the description of data sets it says:
> If DS is the dataset of a query, pattern solutions are therefore
> understood to be not from the active graph of DS itself, but from an
> RDF graph, called the scoping graph, which is graph-equivalent to the
> active graph of DS but shares no blank nodes with DS or with BGP.

That is section 12.3.2 so it does not really take entailment into account - 12.6 is an extension point and the rest of the doc is for non-extension.  Hence "graph-equivalent" and to text on "document scope conventions for blank node identifiers."

> 
> "underderstood" is, however, a bit vague.

[I didn't write this text]

I think in it comes from a different heritage/community.

> 
> All in all, this scoping graph business is a bit confusing to me
> since, on the other hand, the scoping graph is said to be a purely
> theoretical construct and you could also map the variables to the
> active graph and use the scoping graph only to rename blank nodes in
> the matches that you got from the active graph. Is that how I have to
> read it?

Or even make G = SG for simple entailment which is what systems actually do.

> 
> I'll try my example again this time assuming I decide entailment
> w.r.t. G and show how C1 and C2 need to be rephrased then.
> We assumed the active graph G is:
> _:b1 :p :z.
> _:b2 :q :y.
> We assumed the scoping graph SG is:
> _:sg1 :p :z.
> _:sg2 :q :y.
> and the BGP of the query is
> ?x :p _:n . (I named [] _:n just to have some ID for it. )

> If I do not derive the solutions from SG, but from G it would work as
> follows:
> G entails
> _:b1_1 :p :z.       for _:b1_1 allocated to _:b1

I don't see anything to say there is only one bNode allocated for _:b1. 

> _:b1_2 :p :z.       for _:b1_2 allocated to _:b1_1

which makes the casual chain _:b1 -> _:b1_1 -> _:b1_2 -> 
undefined.

> ...
> _:b1 :p _:z_1.     for _:z_1 allocated to :z
> _:b1_1 :p _:z_1.
> ...
> _:b2_1 :q :y.       for _:b2_1 allocated to _:b2
> ...
> same as when using SG but starting with blank nodes _:b1 and _b2
> instead of _:sg1 and _:sg2. We get the following pattern instance
> mappings for the BGP:
> P0: mu: x->_:b1,  sigma: _:n->:z
> P1: mu: x->_:b1_1,  sigma: _:n->:z
> P2: mu: x->_:b1_2,  sigma: _:n->:z
> ...
> Pn: mu: x->_:b1,  sigma: _:n->_:z_1
> Pn+1: mu: x->_:b1_1,  sigma: _:n->_:z_1
> ...
> but not
> mu: x->_:b2,  sigma: _:n->:y because the predicate is :q
> mu: x->_:b2_1,  sigma: _:n->:y etc
> 
> Since blank nodes in SG are different from the ones in G, we have as
> required that
> SG RDFS-entailes Sg union P0(BGP0) union P1(BGP1) union ...
> although strictly speaking the renaming of blank nodes from BGP to
> BGP0, BGP1, ... makes the pattern instance mappings possibly invalid,
> but we don't really need sigma any more anyway.
> We have the possible solutions mu0:x->_:b1, mu1:x->_:b1_1,
> mu2:x->_:b1_2, ... for ?x.
> Now what we can say is that since G and SG are graph equivalent (only
> differing in their blank nodes), there is a bijection M (one-to-one
> mapping) from G to SG witnessing the equivalence with M(_:b1)=:_sg1
> and M(_:b2)=:_sg2 and we can rephrase C1 and C2 as follows
> A possible solution μ is a solution for BGP from G under RDF
> entailment if (C1) for each subject s of a triple ( s, p, o ) in
> P(BGP), M(s) is defined and M(s) occurs in the scoping graph and (C2)
> if μ(v) is a blank node, then M(μ(v)) is defined and M(μ(v)) occurs in
> the scoping graph SG.

So we restrict SG to be e-entailed by G (sec 12.6)
and also M(G) is a subgraph of SG for some fixed-per-query M

[M(G) being the function induced by M applying M to all blank nodes of G.]

> Now we would get only mu0:x->_:b1 as an answer since M(_:b1)=_:sg1
> occurs in the scoping graph, but mu1:x->_:b1_1 is not an answer since
> mu1(x) is a blank node, but M is not defined for mu1(x)=_:b1_1as
> required by C2. Similarly for all other solution mappings mu2, ...
> 
> Now, however, the blank nodes in the answers are not really from SG
> and that is why I thought we do the whole mapping on SG (at least
> theoretically), but is that how it is intended? I can of course rename
> them with M(blank node), but I cannot read that from the spec.

This is where the use of subgraph helps, as does G=SG for simple entailment which is also where the limiting infinite possibilities comes from.

----

I think there is a solution in here but I'm getting to the point where I need to see the proposed design in one place, then work through cases.

 Andy

> 
> Birte
> 
> 
> 
> 
> >
> >> -----Original Message-----
> >> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-
> request@w3.org]
> >> On Behalf Of Birte Glimm
> >> Sent: 08 October 2009 12:22
> >> To: SPARQL Working Group
> >> Subject: Re: [TF-ENT] redundant blank nodes
> >>
> >> Actually, I don't think any more we have a problem here because (x,
> >> _:sg2) is not an entailed triple. I should have seen that earlier and
> >> apologise for the confusion. I explain below:
> >>
> >> We assumed the scoping graph SG is:
> >> _:sg1 :p :z.
> >> _:sg2 :q :y.
> >> and the BGP of the query is
> >> ?x :p [] .
> >> We assumed that (unfortunately) (x, _:sg2) would be an answer, but it
> >> is not. Clearly _:sg2 :p [] is a well-formed RDF triple, but it is NOT
> >> RDF(S) entailed by SG or G because you cannot just exchange blank
> >> nodes. You can introduce fresh blank nodes, but you have to keep track
> >> of which of your fresh blank nodes is allocated to which existing
> >> node. You can validly derive:
> >> _:sg1_1 :p :z.       for _:sg1_1 allocated to _:sg1
> >> _:sg1_2 :p :z.       for _:sg1_2 allocated to _:sg1_1
> >> _:sg1 :p _:z_1.     for _:z_1 allocated to :z
> >> _:sg1_1 :p _:z_1.
> >> etc
> >> There is no legal way to get
> >> _:sg2 :p :z.
> >> This would say that :z has both a :p and a :q predecessor and that
> >> there exists a node (_:sg2) that has both a :p and a :p sucessor. This
> >> does not follow from the given graph. The given graph just says there
> >> exists a node that has :z has :p successor (_:sg1) and there exists a
> >> node that as :y as :q successor (_:sg2). They could be the same, but
> >> they do not have to be the same and entailment considers only what is
> >> true in all models.
> >> Thus, you cannot arbitrarily use blank nodes from the scoping graph.
> >> You can only use them in consequences if they stand for the same node.
> >> You can introduce (arbitrarily many) fresh blank nodes, but you still
> >> have to keep track of which fresh node represents which original node
> >> (called allocated to in the RDF spec). We will not return this fresh
> >> blank nodes in the query answers (could be infinitely many), but they
> >> can of course be used in the derivation process.
> >>
> >> Birte
> >>
> >>
> >> 2009/10/7 Birte Glimm <birte.glimm@comlab.ox.ac.uk>:
> >> > Hi all,
> >> > I am working my way through the open questions/issues, so the next on
> >> > my list is something that Andy mentioned. At the moment, we have the 2
> >> > conditions C1 and C2 that restrict the set of possible answers to a
> >> > finite set of answers for RDF(S) entailment regimes. What these
> >> > conditions don't cover is redundant answers that use different blank
> >> > nodes. E.g.,
> >> > suppose G is
> >> > _:b1 :p :z.
> >> > _:b2 :q :y.
> >> > SG is:
> >> > _:sg1 :p :z.
> >> > _:sg2 :q :y.
> >> > and the BGP of the query is
> >> > ?x :p [] .
> >> > We would get the two solutions
> >> > (x, _:sg1)
> >> > (x, _:sg2)
> >> > because both _:sg1 :p [] and _:sg2 :p [] are well-formed RDF triples
> >> > that are RDF(S) entailed by G (C1) each subject is in the set of terms
> >> > used by the scoping graph and (C2) μ(?x) is a blank node occurring in
> >> > SG. This always results in a finite answer sequence, but the more
> >> > blank nodes we have origianlly, the more redundant answers we get.
> >> >
> >> > Now what I would rather have only (x, _:sg1) as an answer. This could
> >> > be defined by a notion of derivability I think. E.g.,
> >> > Let R be a set of entailment rules for the entailment regime E, then,
> >> > for each triple (s, p, o) in P(BGP), there must be a derivation of (s,
> >> > p, o) from SG by means of R.
> >> > Now I could use, for example, the RDFS entailment rules as suggested
> >> > by ter Horst, and I get what I want because _:sg2 :p [] has no
> >> > derivation.
> >> >
> >> > What I am quite unhappy about is the use of "a set of entailment
> >> > rules" because different systems might want to use different ways of
> >> > deriving consequences and this might be too specific. Any opinions on
> >> > that? Any better suggestion?
> >> >
> >> > Birte
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Dr. Birte Glimm, Room 306
> >> > Computing Laboratory
> >> > Parks Road
> >> > Oxford
> >> > OX1 3QD
> >> > United Kingdom
> >> > +44 (0)1865 283529
> >> >
> >>
> >>
> >>
> >> --
> >> Dr. Birte Glimm, Room 306
> >> Computing Laboratory
> >> Parks Road
> >> Oxford
> >> OX1 3QD
> >> United Kingdom
> >> +44 (0)1865 283529
> >
> >
> 
> 
> 
> --
> Dr. Birte Glimm, Room 306
> Computing Laboratory
> Parks Road
> Oxford
> OX1 3QD
> United Kingdom
> +44 (0)1865 283529
Received on Monday, 12 October 2009 10:55:55 UTC