- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Fri, 21 Oct 2011 10:10:37 +0100
- To: public-rdf-wg@w3.org
On 19/10/11 23:38, Alex Hall wrote:
> On Wed, Oct 19, 2011 at 5:29 PM, Eric Prud'hommeaux <eric@w3.org
> <mailto:eric@w3.org>> wrote:
>
> <snip/>
> > Sure, that would work in this particular case. But it requires
> the person
> > writing the query to have some extra knowledge about which
> relations appear
> > in which graph. If I have <g2> as an inference graph holding
> statements
> > entailed from some base facts in <g1>, then for the purposes of
> querying I
> > don't know or particularly care whether a given fact is entailed
> or was
> > asserted as a base fact. It should all behave as a single
> logical graph.
>
> ooo, here lies darkness and fear.
>
> Another other cost of bnodes could be that it could step on the
> completeness of tableau algorithms. The DAWG (previous SPARQL WG)
> spend a long time dealing with issues around SPARQL queries over OWL
> models where the query pattern is satisfied in every model, but the
> individuals aren't (known to be) the same in every model.
>
> The issue manifested in SPARQL's case in that general expressivity of
> { ?s <p1> ?o1 } { ?s <p2> ?o2 }
> exceeded the expressivity of OWL DL tools. The solution in SPARQL's
> case was to use bnodes in the graph pattern to serve as a class of
> variables which one was not then allowed to examine (say with SELECT
> or CONSTRUCT). Pellet was able to answer queries where bnodes were
> used in graph patterns where the terms to which they would bind were
> not consistent between models. Thus the pattern { _:s <p1> ?o1 } could
> have more solutions than would { ?s <p1> ?o1 }.
Last I heard, non-distiguished variables haven't taken off in OWL
entailment querying. Maybe an expert could comment on the current start
of the art.
http://www.w3.org/TR/sparql11-entailment/ does not mention them.
> I understand what you're saying here -- the presence of blank nodes in
> SPARQL BGP's never made sense to me up until now, so thanks for the
> explanation :-)
>
> The use case that I'm talking about is specifically one where we're
> applying forward-chaining inference rules using Horn clauses, which can
> be implemented using SPARQL INSERT operations. Example:
>
> INSERT { GRAPH <inf-graph> { ?x rdf:type ?y } }
> WHERE { GRAPH <base-graph> { ?x ?p ?o . ?p rdfs:domain ?y } }
>
> There's no uncertainty or disjunction here -- if ?x matches a blank node
> in the base graph, then I *know* that the resource denoted by that blank
> node in the inference graph is exactly the same resource that is denoted
> by that blank node in the base graph. Many applications depend on being
> able to match query patterns across both graphs in this way.
+1
Looking in the base graph is quite common.
>
>
> The other caveat was that these "variables" could not be used between
> basic graph patterns, so { _:s <p1> ?o1 } { _:s <p2> ?o2 } could not
> be used to ask a DL engine if there were solutions where the
> individuals satisfying the first pattern intersected with the
> individuals satisfying the second pattern.
>
>
> Now it's my turn to be surprised -- I always assumed that those bnodes
> in the query pattern were just rewritten as autogenerated variables
> scoped to the whole query. I learn something new each time I revisit
> the SPARQL spec...
That can be done if the entailment does not require non-distinguished
variables (no anon disjunction).
> http://www.w3.org/mid/20060716171342.GA8900@w3.org shows an example of
> two models satisfying a query with different individuals assigned to a
> graph pattern. Look about half way down for "implied by your little
> house example".
>
> Not sure how this will map to shared bnodes, but this might give you
> some ideas.
>
>
> I think the issues are orthogonal. The ability to share bnodes between
> graphs in the dataset doesn't mean that all applications have to use
> that ability.
I think this is the key point. It's a significant use case. Like many
feature, it can be misused.
[Aside on the problems of bNode label maps, which is a very real problem
even for single graphs. Skolemize is a partial solution.]
Andy
> Presumably, a tableau reasoner wanting to materialize the
> entailed graph from your example would mint new bnodes to do so, not use
> a variable matched to something already in the graph. As far as the
> presence of bnodes in a query pattern, I don't see how those would be
> impacted by bnodes shared between graphs any more than by URIs shared
> between graphs.
>
>
>
> > > ? Currently, there is no official way to populate <g1>, <g2> as
> > > described above, but if the RDF WG decided it were so, the SPARQL
> > > query would work out of the box. Of course, the cost is fairly
> high in
> > > that this makes all bnodes "told bnodes" via an exhaustive
> search for
> > > bnodes common to multiple graphs.
> > >
> >
> > There's no way to serialize that dataset in a way that preserves the
> > sameness of the bnode shared between those graphs using any of
> the existing
> > standards. But there is an official way with SPARQL 1.1 to
> populate those
> > graphs: load an RDF/XML or Turtle file into one graph, and use an
> INSERT
> > operation to copy some bnodes from that graph into another graph.
>
> Wow, I assumed there was a rule against that (which implementations
> wouldn't have any incentive to enforce).
>
>
> Yeah, the SPARQL Update spec as appears in last call says that it's the
> same bnode that gets inserted. It would have to be when writing back to
> the same graph; the spec is quiet on whether it's the same bnode when
> writing to a different graph, but in the absence of any rule against it,
> I'd say an implementation is free to re-use the same internal node ID.
>
> -Alex
>
>
> > I'm sensitive to the implementation concerns of making bnode labels
> > document-scoped in multi-graph syntaxes, but given that it's
> possible for
> > graphs in a SPARQL dataset to share bnodes, it would be nice to
> have a
> > serialization format for the dataset that preserves the sameness
> of those
> > bnodes.
> >
> > -Alex
> >
> >
> >
> > >
> > > > >I imagine that there are historical reasons why merge is
> specified here
> > > > >and not union, but it would be really nice if stores had
> license to do a
> > > > >union in the case where they have specific knowledge that a
> blank node
> > > > >identifier shared between the graphs does in fact denote a
> common
> > > resource.
> > > > >
> > > > >-Alex
> > > >
> > > > Yes, it would be nice. All the stores I know much about will
> > > > maintain the sameness in the same situation. SPARQL does define
> > > > FROM-FROM as an RDF merge though, which keeps bNodes apart,
> but it's
> > > > working at the level of simple entailment.
> > > >
> > > > Normally, the bNodes will have different internal identifiers
> just
> > > > by being read in so something (some knowledge) made them the
> same.
> > > > I don't know of a store that uses the same internal id in
> different
> > > > graphs for different bNodes at the same time but it's quite
> possible
> > > > there is one and it's not wrong (maybe keep each graph on disk in
> > > > RDF/XML format).
> > > >
> > > > Once <g1> and <g2> are known to contain the same bNode (whatever
> > > > that might mean) then I think we're in the territory of
> additional
> > > > "specific knowledge", which is outside RDF simple entailment; RDF
> > > > only talks about one graph anyway. It's like doing smushing
> on the
> > > > data or equating by inverse functional property - a level of
> > > > entailment (a rather low level even if more than simple
> entailment)
> > > > that provides more conclusions from the data.
> > > >
> > > > Andy
> > > >
> > >
> > > --
> > > -ericP
> > >
> > >
>
> --
> -ericP
>
>
Received on Friday, 21 October 2011 09:11:17 UTC