- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Tue, 14 Nov 2006 13:53:51 +0000
- To: Fred Zemke <fred.zemke@oracle.com>
- CC: public-rdf-dawg@w3.org
Fred Zemke wrote: > I believe my message > http://lists.w3.org/Archives/Public/public-rdf-dawg/2006OctDec/0135.html > is supportive of Andy's option 2 below. That's good to know. 2 or 3 seemed to match that message. > However, I think that Andy is misquoting Perez et.al "Semantics of SPARQL". > As I read that paper, it says that a BGP with a blank node identifier > can have a solution with cardinality greater than 1. That is, Perez et.al. > treat blank nodes as variables, as in Andy's option 1. Your right - I jumped to making theta / b() non-counting i.e. a pure existential reading with card(u) = 1, not card(u) = card(b) as in your message. > As Andy suggests, I think that we want to provide a growth path > so that SPARQL can be used with general entailment regimes for BGP matching. > My understanding of entailment is that the cardinality of a solution > to a BGP must be 1. However, I want the entailment experts on the > committee to weigh in on this issue. I'd like the entailment experts opinions as well. As far as I can see, if + we don't have a strong upgrade path, using only the necessary but not sufficient condition on other entailment regimes, + we use option 2 (explicit bnode mapping) then we can choose as to whether blank nodes in queries can reveal a count of not. It is as if they also reveal the reasons for their existential assertion given the data available. Implementation-wise it might be easier to have them treated that way and it accords with the (slack?) usage in queries as unnamed variables. In other words, if we choose 2 as the basis of BGP matching for simple entailment, we could choose any of: 1/ bnodes in queries mean card(u) = 1 2/ bnodes in queries mean card(u) = card(b) 3/ Undefined: but presumably between 1 and card(b). Andy > > Fred > > Seaborne, Andy wrote: > >> We need to fix the definition of BGP matching because the current one >> has a bug where the graph is redundant. >> >> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0085.html >> >> This is a summary of approaches, gathering together material from all >> the discussions I know about into (roughly) similar terminology. >> >> >> == Structure >> >> Overall, treat BGP matching as a parameter of SPARQL. rq24 need only >> define it for simple entailment. Any other entailment regime can >> provide it's own form of BGP matching appropriate to it's entailment >> characteristics. The only requirement is that the solutions >> (mappings) generated bindings for every named variable in the BGP. >> >> We could impose on any entailment regime that G |= u(P) as a necessary >> but not sufficient condition (simple entailment). >> >> >> == Simple Entailment >> >> For simple entailment, the options are: >> >> 1/ Define: >> >> w(P) is a subset of G >> >> for a mapping w of pattern P on graph G to terms of G. P does not >> share any blank nodes with G. The mapping w applies to name variables >> and to blank nodes in the BGP; blank nodes behave very much as named >> variables except there is no binding for them; alternatively, they are >> projected away. Cardinality is one only if the blank nodes are >> projected and also a distinct applied to the pattern matching. >> Fortunately, if the whole query is DISTINCT, only the query DISTINCT >> is needed. Otherwise BGP matching has duplicates. >> >> >> 2/ Similar to 1 except have two mappings, one for blank nodes b(), one >> for named variables u() (see Jorge's "Semantics of SPARQL" sec 5.1). >> Now the cardinality of the mapping is one always. >> >> u(b(P)) is a subset of G >> >> >> 3/ Retain the use of an entailment relationship and skolemize the >> blank nodes in the graph. >> >> P does not share any blank nodes with G. Write f as a skolemization >> over the bnodes in term(G), mapping any occurrences of a bnode in G to >> a unique URIs. Define matching by >> >> f(G) |= f(u(P)) where |= is simple entailment. >> >> f is the same on both sides. u does not map bnodes in P. f is fixed >> for the BGP match (fixing to be the same for all the query for the >> same graph gives the same results because it's range does not leak out >> of the BGP match). >> >> A blank node in P behaves as an existential variable. This is like >> the current definition except that issues to do with bnodes in the >> scoping set are removed from the application of |= by the >> skolemization. In other words, blank nodes in the graph are "turned >> off" for the purposes of simple entailment in BGP matching. >> >> >> == Notes >> >> A/ 2 and 3 give the same results. b() maps blank nodes to terms of G >> which are then matched by identity. f() produces URIs which are >> matched by identity and neither side of |= involves bnodes. >> >> B/ The difference between 1 and 2/3 is in whether blank nodes cause >> duplicates in BGP matching. Having distinct(var(w(P))) makes 1=2=3 >> where var makes it clear that w(P) is reduced to the named variables >> (i.e. set of u(P)) >> >> C/ It's not clear to me at the moment that (3) provides much of an >> upwards compatibility path to other entailment regimes unless the >> skolemization process is acceptable to the other entailment regime, >> and I don't know if it is acceptable, universally or just widely. If >> it's not a path of any compatibility, then it's just a way of writing >> 2 using the word "entailment" - b() captures the simple entailment >> nature in 2. >> >> D/ It's not clear to me what the LC1 position is on blank nodes in >> BGPs. It does not consider cardinality so it was a non-issue. >> >> >> == Background >> >> http://planetmath.org/encyclopedia/Skolemization >> >> http://www.w3.org/TR/rdf-mt/#prf >> [[ >> Skolemization is a syntactic transformation routinely used in >> automatic inference systems in which existential variables are >> replaced by 'new' functions - function names not used elsewhere - >> applied to any enclosing universal variables. In RDF, Skolemization >> amounts to replacing every blank node in a graph by a 'new' name, i.e. >> a URI reference which is guaranteed to not occur anywhere else. In >> effect, it gives 'arbitrary' names to the anonymous entities whose >> existence was asserted by the use of blank nodes: the arbitrariness of >> the names ensures that nothing can be inferred that would not follow >> from the bare assertion of existence represented by the blank node. >> (Using a literal would not do. Literals are never 'new' in the >> required sense.) >> >> To be precise, a Skolemization of E (with respect to V) is a ground >> instance of E with respect to V with a 1:1 instance mapping that maps >> each blank node in G to a URI reference that does not appear in G (so >> the Skolem vocabulary V must be disjoint from the vocabulary of E) >> ]] >> >> Andy >> >> > >
Received on Tuesday, 14 November 2006 13:54:13 UTC