Re: Scope of blank nodes in SPARQL? from Andy Seaborne on 2011-10-18 (public-rdf-wg@w3.org from October 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 18 Oct 2011 19:20:04 +0100
To: public-rdf-wg@w3.org
Message-ID: <4E9DC354.7060009@epimorphics.com>

On 18/10/11 16:20, Alex Hall wrote:
...
> Now, a follow-up question:
>
> Given a store containing two graphs with the following statements:
>
> <g1> = { _:s <p1> "foo". }
> <g2> = { _:s <p2> "bar". }
>
> Assume that _:s here denotes the same blank node shared between the
> graphs (e.g. was inserted into one graph using an INSERT operation as
> illustrated above).  This is a common situation in the case where <g2>
> is an inference graph that holds entailed statements computed by
> applying forward-chaining rules to <g1>.  How can I query the union of
> those two graphs in a way that a variable can match the blank node in
> both graphs?
>
> In other words, I'd like to do say something like:
>
> SELECT ?o1 ?o2
> FROM <g1>
> FROM <g2>
> WHERE { ?s <p1> ?o1 . ?s <p2> ?o2 }
>
> and find a single solution, { ?o1="foo", ?o2="bar" }.  I suspect that
> many (most?) stores will give the result that I'm looking for in this
> situation -- I know Mulgara will.  But strictly speaking, the default
> graph for this query is found by taking the merge of all graphs
> mentioned in a FROM clause, which implies renaming of shared blank
> nodes.  In this case, I want the union of those graphs, not the merge;
> is there any way of getting that without relying on store-specific
> implementation details?
>
> I imagine that there are historical reasons why merge is specified here
> and not union, but it would be really nice if stores had license to do a
> union in the case where they have specific knowledge that a blank node
> identifier shared between the graphs does in fact denote a common resource.
>
> -Alex

Yes, it would be nice.  All the stores I know much about will maintain 
the sameness in the same situation.  SPARQL does define FROM-FROM as an 
RDF merge though, which keeps bNodes apart, but it's working at the 
level of simple entailment.

Normally, the bNodes will have different internal identifiers just by 
being read in so something (some knowledge) made them the same.  I don't 
know of a store that uses the same internal id in different graphs for 
different bNodes at the same time but it's quite possible there is one 
and it's not wrong (maybe keep each graph on disk in RDF/XML format).

Once <g1> and <g2> are known to contain the same bNode (whatever that 
might mean) then I think we're in the territory of additional "specific 
knowledge", which is outside RDF simple entailment; RDF only talks about 
one graph anyway.  It's like doing smushing on the data or equating by 
inverse functional property - a level of entailment (a rather low level 
even if more than simple entailment) that provides more conclusions from 
the data.

 Andy

Received on Tuesday, 18 October 2011 18:20:37 UTC