- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Mon, 13 Nov 2006 14:30:57 -0800
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
I believe my message http://lists.w3.org/Archives/Public/public-rdf-dawg/2006OctDec/0135.html is supportive of Andy's option 2 below. However, I think that Andy is misquoting Perez et.al "Semantics of SPARQL". As I read that paper, it says that a BGP with a blank node identifier can have a solution with cardinality greater than 1. That is, Perez et.al. treat blank nodes as variables, as in Andy's option 1. As Andy suggests, I think that we want to provide a growth path so that SPARQL can be used with general entailment regimes for BGP matching. My understanding of entailment is that the cardinality of a solution to a BGP must be 1. However, I want the entailment experts on the committee to weigh in on this issue. Fred Seaborne, Andy wrote: > > We need to fix the definition of BGP matching because the current one > has a bug where the graph is redundant. > > http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0085.html > > This is a summary of approaches, gathering together material from all > the discussions I know about into (roughly) similar terminology. > > > == Structure > > Overall, treat BGP matching as a parameter of SPARQL. rq24 need only > define it for simple entailment. Any other entailment regime can > provide it's own form of BGP matching appropriate to it's entailment > characteristics. The only requirement is that the solutions > (mappings) generated bindings for every named variable in the BGP. > > We could impose on any entailment regime that G |= u(P) as a necessary > but not sufficient condition (simple entailment). > > > == Simple Entailment > > For simple entailment, the options are: > > 1/ Define: > > w(P) is a subset of G > > for a mapping w of pattern P on graph G to terms of G. P does not > share any blank nodes with G. The mapping w applies to name variables > and to blank nodes in the BGP; blank nodes behave very much as named > variables except there is no binding for them; alternatively, they are > projected away. Cardinality is one only if the blank nodes are > projected and also a distinct applied to the pattern matching. > Fortunately, if the whole query is DISTINCT, only the query DISTINCT > is needed. Otherwise BGP matching has duplicates. > > > 2/ Similar to 1 except have two mappings, one for blank nodes b(), one > for named variables u() (see Jorge's "Semantics of SPARQL" sec 5.1). > Now the cardinality of the mapping is one always. > > u(b(P)) is a subset of G > > > 3/ Retain the use of an entailment relationship and skolemize the > blank nodes in the graph. > > P does not share any blank nodes with G. Write f as a skolemization > over the bnodes in term(G), mapping any occurrences of a bnode in G to > a unique URIs. Define matching by > > f(G) |= f(u(P)) where |= is simple entailment. > > f is the same on both sides. u does not map bnodes in P. f is fixed > for the BGP match (fixing to be the same for all the query for the > same graph gives the same results because it's range does not leak out > of the BGP match). > > A blank node in P behaves as an existential variable. This is like > the current definition except that issues to do with bnodes in the > scoping set are removed from the application of |= by the > skolemization. In other words, blank nodes in the graph are "turned > off" for the purposes of simple entailment in BGP matching. > > > == Notes > > A/ 2 and 3 give the same results. b() maps blank nodes to terms of G > which are then matched by identity. f() produces URIs which are > matched by identity and neither side of |= involves bnodes. > > B/ The difference between 1 and 2/3 is in whether blank nodes cause > duplicates in BGP matching. Having distinct(var(w(P))) makes 1=2=3 > where var makes it clear that w(P) is reduced to the named variables > (i.e. set of u(P)) > > C/ It's not clear to me at the moment that (3) provides much of an > upwards compatibility path to other entailment regimes unless the > skolemization process is acceptable to the other entailment regime, > and I don't know if it is acceptable, universally or just widely. If > it's not a path of any compatibility, then it's just a way of writing > 2 using the word "entailment" - b() captures the simple entailment > nature in 2. > > D/ It's not clear to me what the LC1 position is on blank nodes in > BGPs. It does not consider cardinality so it was a non-issue. > > > == Background > > http://planetmath.org/encyclopedia/Skolemization > > http://www.w3.org/TR/rdf-mt/#prf > [[ > Skolemization is a syntactic transformation routinely used in > automatic inference systems in which existential variables are > replaced by 'new' functions - function names not used elsewhere - > applied to any enclosing universal variables. In RDF, Skolemization > amounts to replacing every blank node in a graph by a 'new' name, i.e. > a URI reference which is guaranteed to not occur anywhere else. In > effect, it gives 'arbitrary' names to the anonymous entities whose > existence was asserted by the use of blank nodes: the arbitrariness of > the names ensures that nothing can be inferred that would not follow > from the bare assertion of existence represented by the blank node. > (Using a literal would not do. Literals are never 'new' in the > required sense.) > > To be precise, a Skolemization of E (with respect to V) is a ground > instance of E with respect to V with a 1:1 instance mapping that maps > each blank node in G to a URI reference that does not appear in G (so > the Skolem vocabulary V must be disjoint from the vocabulary of E) > ]] > > Andy > >
Received on Monday, 13 November 2006 22:33:33 UTC