Re: Scope of blank nodes in SPARQL? from Andy Seaborne on 2011-10-21 (public-rdf-wg@w3.org from October 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 21 Oct 2011 10:10:37 +0100
To: public-rdf-wg@w3.org
Message-ID: <4EA1370D.7010109@epimorphics.com>
On 19/10/11 23:38, Alex Hall wrote:
> On Wed, Oct 19, 2011 at 5:29 PM, Eric Prud'hommeaux <eric@w3.org
> <mailto:eric@w3.org>> wrote:
>
>     <snip/>
>      > Sure, that would work in this particular case.  But it requires
>     the person
>      > writing the query to have some extra knowledge about which
>     relations appear
>      > in which graph.  If I have <g2> as an inference graph holding
>     statements
>      > entailed from some base facts in <g1>, then for the purposes of
>     querying I
>      > don't know or particularly care whether a given fact is entailed
>     or was
>      > asserted as a base fact.  It should all behave as a single
>     logical graph.
>
>     ooo, here lies darkness and fear.
>
>     Another other cost of bnodes could be that it could step on the
>     completeness of tableau algorithms. The DAWG (previous SPARQL WG)
>     spend a long time dealing with issues around SPARQL queries over OWL
>     models where the query pattern is satisfied in every model, but the
>     individuals aren't (known to be) the same in every model.
>
>     The issue manifested in SPARQL's case in that general expressivity of
>       { ?s <p1> ?o1 } { ?s <p2> ?o2 }
>     exceeded the expressivity of OWL DL tools. The solution in SPARQL's
>     case was to use bnodes in the graph pattern to serve as a class of
>     variables which one was not then allowed to examine (say with SELECT
>     or CONSTRUCT). Pellet was able to answer queries where bnodes were
>     used in graph patterns where the terms to which they would bind were
>     not consistent between models. Thus the pattern { _:s <p1> ?o1 } could
>     have more solutions than would { ?s <p1> ?o1 }.

Last I heard, non-distiguished variables haven't taken off in OWL 
entailment querying.  Maybe an expert could comment on the current start 
of the art.

http://www.w3.org/TR/sparql11-entailment/ does not mention them.

> I understand what you're saying here -- the presence of blank nodes in
> SPARQL BGP's never made sense to me up until now, so thanks for the
> explanation :-)
>
> The use case that I'm talking about is specifically one where we're
> applying forward-chaining inference rules using Horn clauses, which can
> be implemented using SPARQL INSERT operations.  Example:
>
> INSERT { GRAPH <inf-graph> { ?x rdf:type ?y } }
> WHERE { GRAPH <base-graph> { ?x ?p ?o . ?p rdfs:domain ?y } }
>
> There's no uncertainty or disjunction here -- if ?x matches a blank node
> in the base graph, then I *know* that the resource denoted by that blank
> node in the inference graph is exactly the same resource that is denoted
> by that blank node in the base graph.  Many applications depend on being
> able to match query patterns across both graphs in this way.

+1

Looking in the base graph is quite common.

>
>
>     The other caveat was that these "variables" could not be used between
>     basic graph patterns, so { _:s <p1> ?o1 } { _:s <p2> ?o2 } could not
>     be used to ask a DL engine if there were solutions where the
>     individuals satisfying the first pattern intersected with the
>     individuals satisfying the second pattern.
>
>
> Now it's my turn to be surprised -- I always assumed that those bnodes
> in the query pattern were just rewritten as autogenerated variables
> scoped to the whole query.  I learn something new each time I revisit
> the SPARQL spec...

That can be done if the entailment does not require non-distinguished 
variables (no anon disjunction).

>     http://www.w3.org/mid/20060716171342.GA8900@w3.org shows an example of
>     two models satisfying a query with different individuals assigned to a
>     graph pattern. Look about half way down for "implied by your little
>     house example".
>
>     Not sure how this will map to shared bnodes, but this might give you
>     some ideas.
>
>
> I think the issues are orthogonal.  The ability to share bnodes between
> graphs in the dataset doesn't mean that all applications have to use
> that ability.

I think this is the key point.  It's a significant use case.  Like many 
feature, it can be misused.

[Aside on the problems of bNode label maps, which is a very real problem 
even for single graphs. Skolemize is a partial solution.]

	Andy

> Presumably, a tableau reasoner wanting to materialize the
> entailed graph from your example would mint new bnodes to do so, not use
> a variable matched to something already in the graph.  As far as the
> presence of bnodes in a query pattern, I don't see how those would be
> impacted by bnodes shared between graphs any more than by URIs shared
> between graphs.
>
>
>
>      > > ? Currently, there is no official way to populate <g1>, <g2> as
>      > > described above, but if the RDF WG decided it were so, the SPARQL
>      > > query would work out of the box. Of course, the cost is fairly
>     high in
>      > > that this makes all bnodes "told bnodes" via an exhaustive
>     search for
>      > > bnodes common to multiple graphs.
>      > >
>      >
>      > There's no way to serialize that dataset in a way that preserves the
>      > sameness of the bnode shared between those graphs using any of
>     the existing
>      > standards.  But there is an official way with SPARQL 1.1 to
>     populate those
>      > graphs: load an RDF/XML or Turtle file into one graph, and use an
>     INSERT
>      > operation to copy some bnodes from that graph into another graph.
>
>     Wow, I assumed there was a rule against that (which implementations
>     wouldn't have any incentive to enforce).
>
>
> Yeah, the SPARQL Update spec as appears in last call says that it's the
> same bnode that gets inserted.  It would have to be when writing back to
> the same graph; the spec is quiet on whether it's the same bnode when
> writing to a different graph, but in the absence of any rule against it,
> I'd say an implementation is free to re-use the same internal node ID.
>
> -Alex
>
>
>      > I'm sensitive to the implementation concerns of making bnode labels
>      > document-scoped in multi-graph syntaxes, but given that it's
>     possible for
>      > graphs in a SPARQL dataset to share bnodes, it would be nice to
>     have a
>      > serialization format for the dataset that preserves the sameness
>     of those
>      > bnodes.
>      >
>      > -Alex
>      >
>      >
>      >
>      > >
>      > > > >I imagine that there are historical reasons why merge is
>     specified here
>      > > > >and not union, but it would be really nice if stores had
>     license to do a
>      > > > >union in the case where they have specific knowledge that a
>     blank node
>      > > > >identifier shared between the graphs does in fact denote a
>     common
>      > > resource.
>      > > > >
>      > > > >-Alex
>      > > >
>      > > > Yes, it would be nice.  All the stores I know much about will
>      > > > maintain the sameness in the same situation.  SPARQL does define
>      > > > FROM-FROM as an RDF merge though, which keeps bNodes apart,
>     but it's
>      > > > working at the level of simple entailment.
>      > > >
>      > > > Normally, the bNodes will have different internal identifiers
>     just
>      > > > by being read in so something (some knowledge) made them the
>     same.
>      > > > I don't know of a store that uses the same internal id in
>     different
>      > > > graphs for different bNodes at the same time but it's quite
>     possible
>      > > > there is one and it's not wrong (maybe keep each graph on disk in
>      > > > RDF/XML format).
>      > > >
>      > > > Once <g1> and <g2> are known to contain the same bNode (whatever
>      > > > that might mean) then I think we're in the territory of
>     additional
>      > > > "specific knowledge", which is outside RDF simple entailment; RDF
>      > > > only talks about one graph anyway.  It's like doing smushing
>     on the
>      > > > data or equating by inverse functional property - a level of
>      > > > entailment (a rather low level even if more than simple
>     entailment)
>      > > > that provides more conclusions from the data.
>      > > >
>      > > >       Andy
>      > > >
>      > >
>      > > --
>      > > -ericP
>      > >
>      > >
>
>     --
>     -ericP
>
>
Received on Friday, 21 October 2011 09:11:17 UTC