Re: Fixing BGP matching from Fred Zemke on 2006-11-13 (public-rdf-dawg@w3.org from October to December 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Mon, 13 Nov 2006 14:30:57 -0800
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <4558F221.1070607@oracle.com>
I believe my message
http://lists.w3.org/Archives/Public/public-rdf-dawg/2006OctDec/0135.html
is supportive of Andy's option 2 below.
However, I think that Andy is misquoting Perez et.al "Semantics of SPARQL".
As I read that paper, it says that a BGP with a blank node identifier
can have a solution with cardinality greater than 1.  That is, Perez et.al.
treat blank nodes as variables, as in Andy's option 1. 

As Andy suggests, I think that we want to provide a growth path
so that SPARQL can be used with general entailment regimes for BGP matching.
My understanding of entailment is that the cardinality of a solution
to a BGP must be 1.  However, I want the entailment experts on the
committee to weigh in on this issue.

Fred

Seaborne, Andy wrote:

>
> We need to fix the definition of BGP matching because the current one 
> has a bug where the graph is redundant.
>
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0085.html
>
> This is a summary of approaches, gathering together material from all 
> the discussions I know about into (roughly) similar terminology.
>
>
> == Structure
>
> Overall, treat BGP matching as a parameter of SPARQL.  rq24 need only 
> define it for simple entailment.  Any other entailment regime can 
> provide it's own form of BGP matching appropriate to it's entailment 
> characteristics.  The only requirement is that the solutions 
> (mappings) generated bindings for every named variable in the BGP.
>
> We could impose on any entailment regime that G |= u(P) as a necessary 
> but not sufficient condition (simple entailment).
>
>
> == Simple Entailment
>
> For simple entailment, the options are:
>
> 1/ Define:
>
>   w(P) is a subset of G
>
> for a mapping w of pattern P on graph G to terms of G.  P does not 
> share any blank nodes with G.  The mapping w applies to name variables 
> and to blank nodes in the BGP; blank nodes behave very much as named 
> variables except there is no binding for them; alternatively, they are 
> projected away.  Cardinality is one only if the blank nodes are 
> projected and also a distinct applied to the pattern matching.  
> Fortunately, if the whole query is DISTINCT, only the query DISTINCT 
> is needed.  Otherwise BGP matching has duplicates.
>
>
> 2/ Similar to 1 except have two mappings, one for blank nodes b(), one 
> for named variables u() (see Jorge's "Semantics of SPARQL" sec 5.1).  
> Now the cardinality of the mapping is one always.
>
>   u(b(P)) is a subset of G
>
>
> 3/ Retain the use of an entailment relationship and skolemize the 
> blank nodes in the graph.
>
> P does not share any blank nodes with G.  Write f as a skolemization 
> over the bnodes in term(G), mapping any occurrences of a bnode in G to 
> a unique URIs. Define matching by
>
>   f(G) |= f(u(P))  where |=  is simple entailment.
>
> f is the same on both sides.  u does not map bnodes in P.  f is fixed 
> for the BGP match (fixing to be the same for all the query for the 
> same graph gives the same results because it's range does not leak out 
> of the BGP match).
>
> A blank node in P behaves as an existential variable.  This is like 
> the current definition except that issues to do with bnodes in the 
> scoping set are removed from the application of |= by the 
> skolemization.  In other words, blank nodes in the graph are "turned 
> off" for the purposes of simple entailment in BGP matching.
>
>
> == Notes
>
> A/ 2 and 3 give the same results.  b() maps blank nodes to terms of G 
> which are then matched by identity.  f() produces URIs which are 
> matched by identity and neither side of |= involves bnodes.
>
> B/ The difference between 1 and 2/3 is in whether blank nodes cause 
> duplicates in BGP matching.  Having distinct(var(w(P))) makes 1=2=3 
> where var makes it clear that w(P) is reduced to the named variables 
> (i.e. set of u(P))
>
> C/ It's not clear to me at the moment that (3) provides much of an 
> upwards compatibility path to other entailment regimes unless the 
> skolemization process is acceptable to the other entailment regime, 
> and I don't know if it is acceptable, universally or just widely.  If 
> it's not a path of any compatibility, then it's just a way of writing 
> 2 using the word "entailment" - b() captures the simple entailment 
> nature in 2.
>
> D/ It's not clear to me what the LC1 position is on blank nodes in 
> BGPs.  It does not consider cardinality so it was a non-issue.
>
>
> == Background
>
> http://planetmath.org/encyclopedia/Skolemization
>
> http://www.w3.org/TR/rdf-mt/#prf
> [[
> Skolemization is a syntactic transformation routinely used in 
> automatic inference systems in which existential variables are 
> replaced by 'new' functions - function names not used elsewhere - 
> applied to any enclosing universal variables. In RDF, Skolemization 
> amounts to replacing every blank node in a graph by a 'new' name, i.e. 
> a URI reference which is guaranteed to not occur anywhere else. In 
> effect, it gives 'arbitrary' names to the anonymous entities whose 
> existence was asserted by the use of blank nodes: the arbitrariness of 
> the names ensures that nothing can be inferred that would not follow 
> from the bare assertion of existence represented by the blank node. 
> (Using a literal would not do. Literals are never 'new' in the 
> required sense.)
>
> To be precise, a Skolemization of E (with respect to V) is a ground 
> instance of E with respect to V with a 1:1 instance mapping that maps 
> each blank node in G to a URI reference that does not appear in G (so 
> the Skolem vocabulary V must be disjoint from the vocabulary of E)
> ]]
>
>     Andy
>
>
Received on Monday, 13 November 2006 22:33:33 UTC