- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 13 Nov 2006 18:25:48 +0000
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
We need to fix the definition of BGP matching because the current one has a bug where the graph is redundant. http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0085.html This is a summary of approaches, gathering together material from all the discussions I know about into (roughly) similar terminology. == Structure Overall, treat BGP matching as a parameter of SPARQL. rq24 need only define it for simple entailment. Any other entailment regime can provide it's own form of BGP matching appropriate to it's entailment characteristics. The only requirement is that the solutions (mappings) generated bindings for every named variable in the BGP. We could impose on any entailment regime that G |= u(P) as a necessary but not sufficient condition (simple entailment). == Simple Entailment For simple entailment, the options are: 1/ Define: w(P) is a subset of G for a mapping w of pattern P on graph G to terms of G. P does not share any blank nodes with G. The mapping w applies to name variables and to blank nodes in the BGP; blank nodes behave very much as named variables except there is no binding for them; alternatively, they are projected away. Cardinality is one only if the blank nodes are projected and also a distinct applied to the pattern matching. Fortunately, if the whole query is DISTINCT, only the query DISTINCT is needed. Otherwise BGP matching has duplicates. 2/ Similar to 1 except have two mappings, one for blank nodes b(), one for named variables u() (see Jorge's "Semantics of SPARQL" sec 5.1). Now the cardinality of the mapping is one always. u(b(P)) is a subset of G 3/ Retain the use of an entailment relationship and skolemize the blank nodes in the graph. P does not share any blank nodes with G. Write f as a skolemization over the bnodes in term(G), mapping any occurrences of a bnode in G to a unique URIs. Define matching by f(G) |= f(u(P)) where |= is simple entailment. f is the same on both sides. u does not map bnodes in P. f is fixed for the BGP match (fixing to be the same for all the query for the same graph gives the same results because it's range does not leak out of the BGP match). A blank node in P behaves as an existential variable. This is like the current definition except that issues to do with bnodes in the scoping set are removed from the application of |= by the skolemization. In other words, blank nodes in the graph are "turned off" for the purposes of simple entailment in BGP matching. == Notes A/ 2 and 3 give the same results. b() maps blank nodes to terms of G which are then matched by identity. f() produces URIs which are matched by identity and neither side of |= involves bnodes. B/ The difference between 1 and 2/3 is in whether blank nodes cause duplicates in BGP matching. Having distinct(var(w(P))) makes 1=2=3 where var makes it clear that w(P) is reduced to the named variables (i.e. set of u(P)) C/ It's not clear to me at the moment that (3) provides much of an upwards compatibility path to other entailment regimes unless the skolemization process is acceptable to the other entailment regime, and I don't know if it is acceptable, universally or just widely. If it's not a path of any compatibility, then it's just a way of writing 2 using the word "entailment" - b() captures the simple entailment nature in 2. D/ It's not clear to me what the LC1 position is on blank nodes in BGPs. It does not consider cardinality so it was a non-issue. == Background http://planetmath.org/encyclopedia/Skolemization http://www.w3.org/TR/rdf-mt/#prf [[ Skolemization is a syntactic transformation routinely used in automatic inference systems in which existential variables are replaced by 'new' functions - function names not used elsewhere - applied to any enclosing universal variables. In RDF, Skolemization amounts to replacing every blank node in a graph by a 'new' name, i.e. a URI reference which is guaranteed to not occur anywhere else. In effect, it gives 'arbitrary' names to the anonymous entities whose existence was asserted by the use of blank nodes: the arbitrariness of the names ensures that nothing can be inferred that would not follow from the bare assertion of existence represented by the blank node. (Using a literal would not do. Literals are never 'new' in the required sense.) To be precise, a Skolemization of E (with respect to V) is a ground instance of E with respect to V with a 1:1 instance mapping that maps each blank node in G to a URI reference that does not appear in G (so the Skolem vocabulary V must be disjoint from the vocabulary of E) ]] Andy
Received on Monday, 13 November 2006 18:26:14 UTC