Re: Scope of blank nodes in SPARQL? from Alex Hall on 2011-10-19 (public-rdf-wg@w3.org from October 2011)

From: Alex Hall <alexhall@revelytix.com>
Date: Wed, 19 Oct 2011 18:38:28 -0400
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-ID: <CAFq2bixj_qciVYRb-tN5Fke9Ka114rz67Nwi8pBUN_XLZ=-w4g@mail.gmail.com>
On Wed, Oct 19, 2011 at 5:29 PM, Eric Prud'hommeaux <eric@w3.org> wrote:

> <snip/>
> > Sure, that would work in this particular case.  But it requires the
> person
> > writing the query to have some extra knowledge about which relations
> appear
> > in which graph.  If I have <g2> as an inference graph holding statements
> > entailed from some base facts in <g1>, then for the purposes of querying
> I
> > don't know or particularly care whether a given fact is entailed or was
> > asserted as a base fact.  It should all behave as a single logical graph.
>
> ooo, here lies darkness and fear.
>
> Another other cost of bnodes could be that it could step on the
> completeness of tableau algorithms. The DAWG (previous SPARQL WG)
> spend a long time dealing with issues around SPARQL queries over OWL
> models where the query pattern is satisfied in every model, but the
> individuals aren't (known to be) the same in every model.
>
> The issue manifested in SPARQL's case in that general expressivity of
>  { ?s <p1> ?o1 } { ?s <p2> ?o2 }
> exceeded the expressivity of OWL DL tools. The solution in SPARQL's
> case was to use bnodes in the graph pattern to serve as a class of
> variables which one was not then allowed to examine (say with SELECT
> or CONSTRUCT). Pellet was able to answer queries where bnodes were
> used in graph patterns where the terms to which they would bind were
> not consistent between models. Thus the pattern { _:s <p1> ?o1 } could
> have more solutions than would { ?s <p1> ?o1 }.
>

I understand what you're saying here -- the presence of blank nodes in
SPARQL BGP's never made sense to me up until now, so thanks for the
explanation :-)

The use case that I'm talking about is specifically one where we're applying
forward-chaining inference rules using Horn clauses, which can be
implemented using SPARQL INSERT operations.  Example:

INSERT { GRAPH <inf-graph> { ?x rdf:type ?y } }
WHERE { GRAPH <base-graph> { ?x ?p ?o . ?p rdfs:domain ?y } }

There's no uncertainty or disjunction here -- if ?x matches a blank node in
the base graph, then I *know* that the resource denoted by that blank node
in the inference graph is exactly the same resource that is denoted by that
blank node in the base graph.  Many applications depend on being able to
match query patterns across both graphs in this way.


>
> The other caveat was that these "variables" could not be used between
> basic graph patterns, so { _:s <p1> ?o1 } { _:s <p2> ?o2 } could not
> be used to ask a DL engine if there were solutions where the
> individuals satisfying the first pattern intersected with the
> individuals satisfying the second pattern.
>

Now it's my turn to be surprised -- I always assumed that those bnodes in
the query pattern were just rewritten as autogenerated variables scoped to
the whole query.  I learn something new each time I revisit the SPARQL
spec...


>
> http://www.w3.org/mid/20060716171342.GA8900@w3.org shows an example of
> two models satisfying a query with different individuals assigned to a
> graph pattern. Look about half way down for "implied by your little
> house example".
>
> Not sure how this will map to shared bnodes, but this might give you some
> ideas.
>

I think the issues are orthogonal.  The ability to share bnodes between
graphs in the dataset doesn't mean that all applications have to use that
ability.  Presumably, a tableau reasoner wanting to materialize the entailed
graph from your example would mint new bnodes to do so, not use a variable
matched to something already in the graph.  As far as the presence of bnodes
in a query pattern, I don't see how those would be impacted by bnodes shared
between graphs any more than by URIs shared between graphs.


>
>
> > > ? Currently, there is no official way to populate <g1>, <g2> as
> > > described above, but if the RDF WG decided it were so, the SPARQL
> > > query would work out of the box. Of course, the cost is fairly high in
> > > that this makes all bnodes "told bnodes" via an exhaustive search for
> > > bnodes common to multiple graphs.
> > >
> >
> > There's no way to serialize that dataset in a way that preserves the
> > sameness of the bnode shared between those graphs using any of the
> existing
> > standards.  But there is an official way with SPARQL 1.1 to populate
> those
> > graphs: load an RDF/XML or Turtle file into one graph, and use an INSERT
> > operation to copy some bnodes from that graph into another graph.
>
> Wow, I assumed there was a rule against that (which implementations
> wouldn't have any incentive to enforce).
>
>
Yeah, the SPARQL Update spec as appears in last call says that it's the same
bnode that gets inserted.  It would have to be when writing back to the same
graph; the spec is quiet on whether it's the same bnode when writing to a
different graph, but in the absence of any rule against it, I'd say an
implementation is free to re-use the same internal node ID.

-Alex



>
> > I'm sensitive to the implementation concerns of making bnode labels
> > document-scoped in multi-graph syntaxes, but given that it's possible for
> > graphs in a SPARQL dataset to share bnodes, it would be nice to have a
> > serialization format for the dataset that preserves the sameness of those
> > bnodes.
> >
> > -Alex
> >
> >
> >
> > >
> > > > >I imagine that there are historical reasons why merge is specified
> here
> > > > >and not union, but it would be really nice if stores had license to
> do a
> > > > >union in the case where they have specific knowledge that a blank
> node
> > > > >identifier shared between the graphs does in fact denote a common
> > > resource.
> > > > >
> > > > >-Alex
> > > >
> > > > Yes, it would be nice.  All the stores I know much about will
> > > > maintain the sameness in the same situation.  SPARQL does define
> > > > FROM-FROM as an RDF merge though, which keeps bNodes apart, but it's
> > > > working at the level of simple entailment.
> > > >
> > > > Normally, the bNodes will have different internal identifiers just
> > > > by being read in so something (some knowledge) made them the same.
> > > > I don't know of a store that uses the same internal id in different
> > > > graphs for different bNodes at the same time but it's quite possible
> > > > there is one and it's not wrong (maybe keep each graph on disk in
> > > > RDF/XML format).
> > > >
> > > > Once <g1> and <g2> are known to contain the same bNode (whatever
> > > > that might mean) then I think we're in the territory of additional
> > > > "specific knowledge", which is outside RDF simple entailment; RDF
> > > > only talks about one graph anyway.  It's like doing smushing on the
> > > > data or equating by inverse functional property - a level of
> > > > entailment (a rather low level even if more than simple entailment)
> > > > that provides more conclusions from the data.
> > > >
> > > >       Andy
> > > >
> > >
> > > --
> > > -ericP
> > >
> > >
>
> --
> -ericP
>
Received on Wednesday, 19 October 2011 22:38:58 UTC