- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Fri, 21 Oct 2011 10:10:37 +0100
- To: public-rdf-wg@w3.org
On 19/10/11 23:38, Alex Hall wrote: > On Wed, Oct 19, 2011 at 5:29 PM, Eric Prud'hommeaux <eric@w3.org > <mailto:eric@w3.org>> wrote: > > <snip/> > > Sure, that would work in this particular case. But it requires > the person > > writing the query to have some extra knowledge about which > relations appear > > in which graph. If I have <g2> as an inference graph holding > statements > > entailed from some base facts in <g1>, then for the purposes of > querying I > > don't know or particularly care whether a given fact is entailed > or was > > asserted as a base fact. It should all behave as a single > logical graph. > > ooo, here lies darkness and fear. > > Another other cost of bnodes could be that it could step on the > completeness of tableau algorithms. The DAWG (previous SPARQL WG) > spend a long time dealing with issues around SPARQL queries over OWL > models where the query pattern is satisfied in every model, but the > individuals aren't (known to be) the same in every model. > > The issue manifested in SPARQL's case in that general expressivity of > { ?s <p1> ?o1 } { ?s <p2> ?o2 } > exceeded the expressivity of OWL DL tools. The solution in SPARQL's > case was to use bnodes in the graph pattern to serve as a class of > variables which one was not then allowed to examine (say with SELECT > or CONSTRUCT). Pellet was able to answer queries where bnodes were > used in graph patterns where the terms to which they would bind were > not consistent between models. Thus the pattern { _:s <p1> ?o1 } could > have more solutions than would { ?s <p1> ?o1 }. Last I heard, non-distiguished variables haven't taken off in OWL entailment querying. Maybe an expert could comment on the current start of the art. http://www.w3.org/TR/sparql11-entailment/ does not mention them. > I understand what you're saying here -- the presence of blank nodes in > SPARQL BGP's never made sense to me up until now, so thanks for the > explanation :-) > > The use case that I'm talking about is specifically one where we're > applying forward-chaining inference rules using Horn clauses, which can > be implemented using SPARQL INSERT operations. Example: > > INSERT { GRAPH <inf-graph> { ?x rdf:type ?y } } > WHERE { GRAPH <base-graph> { ?x ?p ?o . ?p rdfs:domain ?y } } > > There's no uncertainty or disjunction here -- if ?x matches a blank node > in the base graph, then I *know* that the resource denoted by that blank > node in the inference graph is exactly the same resource that is denoted > by that blank node in the base graph. Many applications depend on being > able to match query patterns across both graphs in this way. +1 Looking in the base graph is quite common. > > > The other caveat was that these "variables" could not be used between > basic graph patterns, so { _:s <p1> ?o1 } { _:s <p2> ?o2 } could not > be used to ask a DL engine if there were solutions where the > individuals satisfying the first pattern intersected with the > individuals satisfying the second pattern. > > > Now it's my turn to be surprised -- I always assumed that those bnodes > in the query pattern were just rewritten as autogenerated variables > scoped to the whole query. I learn something new each time I revisit > the SPARQL spec... That can be done if the entailment does not require non-distinguished variables (no anon disjunction). > http://www.w3.org/mid/20060716171342.GA8900@w3.org shows an example of > two models satisfying a query with different individuals assigned to a > graph pattern. Look about half way down for "implied by your little > house example". > > Not sure how this will map to shared bnodes, but this might give you > some ideas. > > > I think the issues are orthogonal. The ability to share bnodes between > graphs in the dataset doesn't mean that all applications have to use > that ability. I think this is the key point. It's a significant use case. Like many feature, it can be misused. [Aside on the problems of bNode label maps, which is a very real problem even for single graphs. Skolemize is a partial solution.] Andy > Presumably, a tableau reasoner wanting to materialize the > entailed graph from your example would mint new bnodes to do so, not use > a variable matched to something already in the graph. As far as the > presence of bnodes in a query pattern, I don't see how those would be > impacted by bnodes shared between graphs any more than by URIs shared > between graphs. > > > > > > ? Currently, there is no official way to populate <g1>, <g2> as > > > described above, but if the RDF WG decided it were so, the SPARQL > > > query would work out of the box. Of course, the cost is fairly > high in > > > that this makes all bnodes "told bnodes" via an exhaustive > search for > > > bnodes common to multiple graphs. > > > > > > > There's no way to serialize that dataset in a way that preserves the > > sameness of the bnode shared between those graphs using any of > the existing > > standards. But there is an official way with SPARQL 1.1 to > populate those > > graphs: load an RDF/XML or Turtle file into one graph, and use an > INSERT > > operation to copy some bnodes from that graph into another graph. > > Wow, I assumed there was a rule against that (which implementations > wouldn't have any incentive to enforce). > > > Yeah, the SPARQL Update spec as appears in last call says that it's the > same bnode that gets inserted. It would have to be when writing back to > the same graph; the spec is quiet on whether it's the same bnode when > writing to a different graph, but in the absence of any rule against it, > I'd say an implementation is free to re-use the same internal node ID. > > -Alex > > > > I'm sensitive to the implementation concerns of making bnode labels > > document-scoped in multi-graph syntaxes, but given that it's > possible for > > graphs in a SPARQL dataset to share bnodes, it would be nice to > have a > > serialization format for the dataset that preserves the sameness > of those > > bnodes. > > > > -Alex > > > > > > > > > > > > > >I imagine that there are historical reasons why merge is > specified here > > > > >and not union, but it would be really nice if stores had > license to do a > > > > >union in the case where they have specific knowledge that a > blank node > > > > >identifier shared between the graphs does in fact denote a > common > > > resource. > > > > > > > > > >-Alex > > > > > > > > Yes, it would be nice. All the stores I know much about will > > > > maintain the sameness in the same situation. SPARQL does define > > > > FROM-FROM as an RDF merge though, which keeps bNodes apart, > but it's > > > > working at the level of simple entailment. > > > > > > > > Normally, the bNodes will have different internal identifiers > just > > > > by being read in so something (some knowledge) made them the > same. > > > > I don't know of a store that uses the same internal id in > different > > > > graphs for different bNodes at the same time but it's quite > possible > > > > there is one and it's not wrong (maybe keep each graph on disk in > > > > RDF/XML format). > > > > > > > > Once <g1> and <g2> are known to contain the same bNode (whatever > > > > that might mean) then I think we're in the territory of > additional > > > > "specific knowledge", which is outside RDF simple entailment; RDF > > > > only talks about one graph anyway. It's like doing smushing > on the > > > > data or equating by inverse functional property - a level of > > > > entailment (a rather low level even if more than simple > entailment) > > > > that provides more conclusions from the data. > > > > > > > > Andy > > > > > > > > > > -- > > > -ericP > > > > > > > > -- > -ericP > >
Received on Friday, 21 October 2011 09:11:17 UTC