Re: Re: Scope of blank nodes in SPARQL? from Alex Hall on 2011-10-19 (public-rdf-wg@w3.org from October 2011)

From: Alex Hall <alexhall@revelytix.com>
Date: Wed, 19 Oct 2011 10:35:34 -0400
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-ID: <CAFq2biwn7BFdqjuNpwexxf4Cs=Y3BUh9zz60UV36fzqXHoS4Wg@mail.gmail.com>
On Wed, Oct 19, 2011 at 9:20 AM, Eric Prud'hommeaux <eric@w3.org> wrote:

> * Andy Seaborne <andy.seaborne@epimorphics.com> [2011-10-18 19:20+0100]
> >
> >
> > On 18/10/11 16:20, Alex Hall wrote:
> > ...
> > >Now, a follow-up question:
> > >
> > >Given a store containing two graphs with the following statements:
> > >
> > ><g1> = { _:s <p1> "foo". }
> > ><g2> = { _:s <p2> "bar". }
> > >
> > >Assume that _:s here denotes the same blank node shared between the
> > >graphs (e.g. was inserted into one graph using an INSERT operation as
> > >illustrated above).  This is a common situation in the case where <g2>
> > >is an inference graph that holds entailed statements computed by
> > >applying forward-chaining rules to <g1>.  How can I query the union of
> > >those two graphs in a way that a variable can match the blank node in
> > >both graphs?
> > >
> > >In other words, I'd like to do say something like:
> > >
> > >SELECT ?o1 ?o2
> > >FROM <g1>
> > >FROM <g2>
> > >WHERE { ?s <p1> ?o1 . ?s <p2> ?o2 }
> >
> > >and find a single solution, { ?o1="foo", ?o2="bar" }.  I suspect that
> > >many (most?) stores will give the result that I'm looking for in this
> > >situation -- I know Mulgara will.  But strictly speaking, the default
> > >graph for this query is found by taking the merge of all graphs
> > >mentioned in a FROM clause, which implies renaming of shared blank
> > >nodes.  In this case, I want the union of those graphs, not the merge;
> > >is there any way of getting that without relying on store-specific
> > >implementation details?
>
> How about
>
>  SELECT ?o1 ?o2
>  WHERE { GRAPH <g1> { ?s <p1> ?o1 }
>          GRAPH <g2> { ?s <p2> ?o2 } }
>
>
Sure, that would work in this particular case.  But it requires the person
writing the query to have some extra knowledge about which relations appear
in which graph.  If I have <g2> as an inference graph holding statements
entailed from some base facts in <g1>, then for the purposes of querying I
don't know or particularly care whether a given fact is entailed or was
asserted as a base fact.  It should all behave as a single logical graph.


> ? Currently, there is no official way to populate <g1>, <g2> as
> described above, but if the RDF WG decided it were so, the SPARQL
> query would work out of the box. Of course, the cost is fairly high in
> that this makes all bnodes "told bnodes" via an exhaustive search for
> bnodes common to multiple graphs.
>

There's no way to serialize that dataset in a way that preserves the
sameness of the bnode shared between those graphs using any of the existing
standards.  But there is an official way with SPARQL 1.1 to populate those
graphs: load an RDF/XML or Turtle file into one graph, and use an INSERT
operation to copy some bnodes from that graph into another graph.

I'm sensitive to the implementation concerns of making bnode labels
document-scoped in multi-graph syntaxes, but given that it's possible for
graphs in a SPARQL dataset to share bnodes, it would be nice to have a
serialization format for the dataset that preserves the sameness of those
bnodes.

-Alex



>
> > >I imagine that there are historical reasons why merge is specified here
> > >and not union, but it would be really nice if stores had license to do a
> > >union in the case where they have specific knowledge that a blank node
> > >identifier shared between the graphs does in fact denote a common
> resource.
> > >
> > >-Alex
> >
> > Yes, it would be nice.  All the stores I know much about will
> > maintain the sameness in the same situation.  SPARQL does define
> > FROM-FROM as an RDF merge though, which keeps bNodes apart, but it's
> > working at the level of simple entailment.
> >
> > Normally, the bNodes will have different internal identifiers just
> > by being read in so something (some knowledge) made them the same.
> > I don't know of a store that uses the same internal id in different
> > graphs for different bNodes at the same time but it's quite possible
> > there is one and it's not wrong (maybe keep each graph on disk in
> > RDF/XML format).
> >
> > Once <g1> and <g2> are known to contain the same bNode (whatever
> > that might mean) then I think we're in the territory of additional
> > "specific knowledge", which is outside RDF simple entailment; RDF
> > only talks about one graph anyway.  It's like doing smushing on the
> > data or equating by inverse functional property - a level of
> > entailment (a rather low level even if more than simple entailment)
> > that provides more conclusions from the data.
> >
> >       Andy
> >
>
> --
> -ericP
>
>
Received on Wednesday, 19 October 2011 14:36:10 UTC