Re: Fixing BGP matching from Seaborne, Andy on 2006-11-14 (public-rdf-dawg@w3.org from October to December 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 14 Nov 2006 13:53:51 +0000
To: Fred Zemke <fred.zemke@oracle.com>
CC: public-rdf-dawg@w3.org
Message-ID: <4559CA6F.7010905@hp.com>
Fred Zemke wrote:
> I believe my message
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006OctDec/0135.html
> is supportive of Andy's option 2 below.

That's good to know. 2 or 3 seemed to match that message.

> However, I think that Andy is misquoting Perez et.al "Semantics of SPARQL".
> As I read that paper, it says that a BGP with a blank node identifier
> can have a solution with cardinality greater than 1.  That is, Perez et.al.
> treat blank nodes as variables, as in Andy's option 1. 

Your right - I jumped to making theta / b() non-counting i.e. a pure 
existential reading with card(u) = 1, not card(u) = card(b) as in your message.

> As Andy suggests, I think that we want to provide a growth path
> so that SPARQL can be used with general entailment regimes for BGP matching.
> My understanding of entailment is that the cardinality of a solution
> to a BGP must be 1.  However, I want the entailment experts on the
> committee to weigh in on this issue.

I'd like the entailment experts opinions as well.

As far as I can see, if
+ we don't have a strong upgrade path, using only the necessary but not 
sufficient condition on other entailment regimes,
+ we use option 2 (explicit bnode mapping)

then we can choose as to whether blank nodes in queries can reveal a count of 
not.  It is as if they also reveal the reasons for their existential assertion 
given the data available.  Implementation-wise it might be easier to have them 
treated that way and it accords with the (slack?) usage in queries as unnamed 
variables.

In other words, if we choose 2 as the basis of BGP matching for simple 
entailment, we could choose any of:

1/ bnodes in queries mean card(u) = 1
2/ bnodes in queries mean card(u) = card(b)
3/ Undefined: but presumably between 1 and card(b).

 Andy

> 
> Fred
> 
> Seaborne, Andy wrote:
> 
>> We need to fix the definition of BGP matching because the current one 
>> has a bug where the graph is redundant.
>>
>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0085.html
>>
>> This is a summary of approaches, gathering together material from all 
>> the discussions I know about into (roughly) similar terminology.
>>
>>
>> == Structure
>>
>> Overall, treat BGP matching as a parameter of SPARQL.  rq24 need only 
>> define it for simple entailment.  Any other entailment regime can 
>> provide it's own form of BGP matching appropriate to it's entailment 
>> characteristics.  The only requirement is that the solutions 
>> (mappings) generated bindings for every named variable in the BGP.
>>
>> We could impose on any entailment regime that G |= u(P) as a necessary 
>> but not sufficient condition (simple entailment).
>>
>>
>> == Simple Entailment
>>
>> For simple entailment, the options are:
>>
>> 1/ Define:
>>
>>   w(P) is a subset of G
>>
>> for a mapping w of pattern P on graph G to terms of G.  P does not 
>> share any blank nodes with G.  The mapping w applies to name variables 
>> and to blank nodes in the BGP; blank nodes behave very much as named 
>> variables except there is no binding for them; alternatively, they are 
>> projected away.  Cardinality is one only if the blank nodes are 
>> projected and also a distinct applied to the pattern matching.  
>> Fortunately, if the whole query is DISTINCT, only the query DISTINCT 
>> is needed.  Otherwise BGP matching has duplicates.
>>
>>
>> 2/ Similar to 1 except have two mappings, one for blank nodes b(), one 
>> for named variables u() (see Jorge's "Semantics of SPARQL" sec 5.1).  
>> Now the cardinality of the mapping is one always.
>>
>>   u(b(P)) is a subset of G
>>
>>
>> 3/ Retain the use of an entailment relationship and skolemize the 
>> blank nodes in the graph.
>>
>> P does not share any blank nodes with G.  Write f as a skolemization 
>> over the bnodes in term(G), mapping any occurrences of a bnode in G to 
>> a unique URIs. Define matching by
>>
>>   f(G) |= f(u(P))  where |=  is simple entailment.
>>
>> f is the same on both sides.  u does not map bnodes in P.  f is fixed 
>> for the BGP match (fixing to be the same for all the query for the 
>> same graph gives the same results because it's range does not leak out 
>> of the BGP match).
>>
>> A blank node in P behaves as an existential variable.  This is like 
>> the current definition except that issues to do with bnodes in the 
>> scoping set are removed from the application of |= by the 
>> skolemization.  In other words, blank nodes in the graph are "turned 
>> off" for the purposes of simple entailment in BGP matching.
>>
>>
>> == Notes
>>
>> A/ 2 and 3 give the same results.  b() maps blank nodes to terms of G 
>> which are then matched by identity.  f() produces URIs which are 
>> matched by identity and neither side of |= involves bnodes.
>>
>> B/ The difference between 1 and 2/3 is in whether blank nodes cause 
>> duplicates in BGP matching.  Having distinct(var(w(P))) makes 1=2=3 
>> where var makes it clear that w(P) is reduced to the named variables 
>> (i.e. set of u(P))
>>
>> C/ It's not clear to me at the moment that (3) provides much of an 
>> upwards compatibility path to other entailment regimes unless the 
>> skolemization process is acceptable to the other entailment regime, 
>> and I don't know if it is acceptable, universally or just widely.  If 
>> it's not a path of any compatibility, then it's just a way of writing 
>> 2 using the word "entailment" - b() captures the simple entailment 
>> nature in 2.
>>
>> D/ It's not clear to me what the LC1 position is on blank nodes in 
>> BGPs.  It does not consider cardinality so it was a non-issue.
>>
>>
>> == Background
>>
>> http://planetmath.org/encyclopedia/Skolemization
>>
>> http://www.w3.org/TR/rdf-mt/#prf
>> [[
>> Skolemization is a syntactic transformation routinely used in 
>> automatic inference systems in which existential variables are 
>> replaced by 'new' functions - function names not used elsewhere - 
>> applied to any enclosing universal variables. In RDF, Skolemization 
>> amounts to replacing every blank node in a graph by a 'new' name, i.e. 
>> a URI reference which is guaranteed to not occur anywhere else. In 
>> effect, it gives 'arbitrary' names to the anonymous entities whose 
>> existence was asserted by the use of blank nodes: the arbitrariness of 
>> the names ensures that nothing can be inferred that would not follow 
>> from the bare assertion of existence represented by the blank node. 
>> (Using a literal would not do. Literals are never 'new' in the 
>> required sense.)
>>
>> To be precise, a Skolemization of E (with respect to V) is a ground 
>> instance of E with respect to V with a 1:1 instance mapping that maps 
>> each blank node in G to a URI reference that does not appear in G (so 
>> the Skolem vocabulary V must be disjoint from the vocabulary of E)
>> ]]
>>
>>     Andy
>>
>>
> 
>
Received on Tuesday, 14 November 2006 13:54:13 UTC