Re: Scope of blank nodes in SPARQL? from Eric Prud'hommeaux on 2011-10-19 (public-rdf-wg@w3.org from October 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 19 Oct 2011 17:29:41 -0400
To: Alex Hall <alexhall@revelytix.com>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-ID: <20111019212938.GF13670@w3.org>
* Alex Hall <alexhall@revelytix.com> [2011-10-19 10:35-0400]
> On Wed, Oct 19, 2011 at 9:20 AM, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> > * Andy Seaborne <andy.seaborne@epimorphics.com> [2011-10-18 19:20+0100]
> > >
> > >
> > > On 18/10/11 16:20, Alex Hall wrote:
> > > ...
> > > >Now, a follow-up question:
> > > >
> > > >Given a store containing two graphs with the following statements:
> > > >
> > > ><g1> = { _:s <p1> "foo". }
> > > ><g2> = { _:s <p2> "bar". }
> > > >
> > > >Assume that _:s here denotes the same blank node shared between the
> > > >graphs (e.g. was inserted into one graph using an INSERT operation as
> > > >illustrated above).  This is a common situation in the case where <g2>
> > > >is an inference graph that holds entailed statements computed by
> > > >applying forward-chaining rules to <g1>.  How can I query the union of
> > > >those two graphs in a way that a variable can match the blank node in
> > > >both graphs?
> > > >
> > > >In other words, I'd like to do say something like:
> > > >
> > > >SELECT ?o1 ?o2
> > > >FROM <g1>
> > > >FROM <g2>
> > > >WHERE { ?s <p1> ?o1 . ?s <p2> ?o2 }
> > >
> > > >and find a single solution, { ?o1="foo", ?o2="bar" }.  I suspect that
> > > >many (most?) stores will give the result that I'm looking for in this
> > > >situation -- I know Mulgara will.  But strictly speaking, the default
> > > >graph for this query is found by taking the merge of all graphs
> > > >mentioned in a FROM clause, which implies renaming of shared blank
> > > >nodes.  In this case, I want the union of those graphs, not the merge;
> > > >is there any way of getting that without relying on store-specific
> > > >implementation details?
> >
> > How about
> >
> >  SELECT ?o1 ?o2
> >  WHERE { GRAPH <g1> { ?s <p1> ?o1 }
> >          GRAPH <g2> { ?s <p2> ?o2 } }
> >
> >
> Sure, that would work in this particular case.  But it requires the person
> writing the query to have some extra knowledge about which relations appear
> in which graph.  If I have <g2> as an inference graph holding statements
> entailed from some base facts in <g1>, then for the purposes of querying I
> don't know or particularly care whether a given fact is entailed or was
> asserted as a base fact.  It should all behave as a single logical graph.

ooo, here lies darkness and fear.

Another other cost of bnodes could be that it could step on the
completeness of tableau algorithms. The DAWG (previous SPARQL WG)
spend a long time dealing with issues around SPARQL queries over OWL
models where the query pattern is satisfied in every model, but the
individuals aren't (known to be) the same in every model.

The issue manifested in SPARQL's case in that general expressivity of
  { ?s <p1> ?o1 } { ?s <p2> ?o2 }
exceeded the expressivity of OWL DL tools. The solution in SPARQL's
case was to use bnodes in the graph pattern to serve as a class of
variables which one was not then allowed to examine (say with SELECT
or CONSTRUCT). Pellet was able to answer queries where bnodes were
used in graph patterns where the terms to which they would bind were
not consistent between models. Thus the pattern { _:s <p1> ?o1 } could
have more solutions than would { ?s <p1> ?o1 }.

The other caveat was that these "variables" could not be used between
basic graph patterns, so { _:s <p1> ?o1 } { _:s <p2> ?o2 } could not
be used to ask a DL engine if there were solutions where the
individuals satisfying the first pattern intersected with the
individuals satisfying the second pattern.

http://www.w3.org/mid/20060716171342.GA8900@w3.org shows an example of
two models satisfying a query with different individuals assigned to a
graph pattern. Look about half way down for "implied by your little
house example".

Not sure how this will map to shared bnodes, but this might give you some ideas.


> > ? Currently, there is no official way to populate <g1>, <g2> as
> > described above, but if the RDF WG decided it were so, the SPARQL
> > query would work out of the box. Of course, the cost is fairly high in
> > that this makes all bnodes "told bnodes" via an exhaustive search for
> > bnodes common to multiple graphs.
> >
> 
> There's no way to serialize that dataset in a way that preserves the
> sameness of the bnode shared between those graphs using any of the existing
> standards.  But there is an official way with SPARQL 1.1 to populate those
> graphs: load an RDF/XML or Turtle file into one graph, and use an INSERT
> operation to copy some bnodes from that graph into another graph.

Wow, I assumed there was a rule against that (which implementations
wouldn't have any incentive to enforce).


> I'm sensitive to the implementation concerns of making bnode labels
> document-scoped in multi-graph syntaxes, but given that it's possible for
> graphs in a SPARQL dataset to share bnodes, it would be nice to have a
> serialization format for the dataset that preserves the sameness of those
> bnodes.
> 
> -Alex
> 
> 
> 
> >
> > > >I imagine that there are historical reasons why merge is specified here
> > > >and not union, but it would be really nice if stores had license to do a
> > > >union in the case where they have specific knowledge that a blank node
> > > >identifier shared between the graphs does in fact denote a common
> > resource.
> > > >
> > > >-Alex
> > >
> > > Yes, it would be nice.  All the stores I know much about will
> > > maintain the sameness in the same situation.  SPARQL does define
> > > FROM-FROM as an RDF merge though, which keeps bNodes apart, but it's
> > > working at the level of simple entailment.
> > >
> > > Normally, the bNodes will have different internal identifiers just
> > > by being read in so something (some knowledge) made them the same.
> > > I don't know of a store that uses the same internal id in different
> > > graphs for different bNodes at the same time but it's quite possible
> > > there is one and it's not wrong (maybe keep each graph on disk in
> > > RDF/XML format).
> > >
> > > Once <g1> and <g2> are known to contain the same bNode (whatever
> > > that might mean) then I think we're in the territory of additional
> > > "specific knowledge", which is outside RDF simple entailment; RDF
> > > only talks about one graph anyway.  It's like doing smushing on the
> > > data or equating by inverse functional property - a level of
> > > entailment (a rather low level even if more than simple entailment)
> > > that provides more conclusions from the data.
> > >
> > >       Andy
> > >
> >
> > --
> > -ericP
> >
> >

-- 
-ericP
Received on Wednesday, 19 October 2011 21:30:12 UTC