Re: bnodeification

On 11 Jan 2006, at 23:32, Pat Hayes wrote:

>> Hi Pat,
>>
>> Pat Hayes wrote:
>>
>>>  -----------
>>>
>>>  A binding on a pattern Q is a map from the variables and bnodes  
>>> in Q to
>>>  terms. The result B(Q) of applying the binding B to Q is the RDF  
>>> graph
>>>  obtained by replacing each variable or bnode in Q by its value  
>>> under B.
>>>
>>>  An answer binding with respect to a scoping graph G' is a  
>>> binding B such
>>>  that:
>>  > (1)  any bnodeIDs introduced by B occur in G'
>>  > (2)  G entails (G' union B(Q)).
>>
>> It works for RDF(S), but I'm uneasy with the idea of extending  
>> mappings
>> with bnode names. In this way all the the variables in the query  
>> became
>> distinguished variables and no more true existentials.
>
> Logically, there is no distinction between variables and  
> existentials in a query (i.e. on the RHS of a Gentzen sequent),  
> surely? Variables are just existentials that we are asking to see  
> bindings for.

The distinction is on the scope of distinguished variables w.r.t.  
existential variables. In  KR systems (or incomplete databases)  
distinguished variables range over the known constants (objects) in  
the language. The reason is that they're the only element for which  
you can actually prove that they have satisfy the requested  
properties across all the possible models.

With RDF(S) and SPARQL we're going into a middle way, by taking into  
account the bnodes. As you remember we were contrary to allow them in  
query answers, since in that way we're moving into uncharted  
territories. However, as it turned out we can handle them in a fairly  
satisfactory way by considering them a sort of skolem constants with  
some juggling. In the  traditional KR (including DLs) framework I'd  
say that they're sort of 'individual' without the unique name  
assumption (there are some subtleties, but we can live with them). In  
this way we can handle them in the OWL setting as well.

Going back with the entailment business, I was to quick in embracing  
the G' idea. The problem is that bnode renaming (that's what G' is  
all about) breaks the SPARQ algebra.

Pat definition is:

"""
An answer binding with respect to a scoping graph G' is a binding B  
such that:
(1)  any bnodeIDs introduced by B occur in G'
(2)  G entails (G' union B(Q)).
"""

which means that you don't guarantee that bnode names in B are  
exactly the ones in G, but only that there is an one to one mapping  
between those names and the one in G (renaming is allowed).

Unfortunately, this works only for basic graph patterns. When we  
combine the results with the SPARQL operators we obtain unexpected  
results. Consider the dataset:

G = {
_:a foaf:nick "Bijan"
_:a :email "badguy@moxie.com"
}

and the query

SELECT * { {?x :email ?e } UNION {?x foaf:nick ?n } }

to answer the query we should take the union of answers to P1={? 
x :email ?e} and to P2={?x :email ?e}. By allowing arbitrary renaming  
of bnodes we can obtain

B1 = [?x/_:b12, ?e/"badguy@moxie.com"]

because G' entails (G' union B1(P1)), where G' is

G' = {
_:b12 foaf:nick "Bijan"
_:b12 :email "badguy@moxie.com"
}

and

B2 = [?x/_:b36, ?n/"Bijan"]

because G'' entails (G'' union B1(P1)), where G'' is

G'' = {
_:b36 foaf:nick "Bijan"
_:b36 :email "badguy@moxie.com"
}

Therefore, the solution to the overall query is { B1 } union { B2 }:

{ [?x/_:b12, ?e/"badguy@moxie.com"], [?x/_:b36, ?n/"Bijan"] }

Bottom line is that we have to deal with the same name across  
"evaluation" of distinct BGP, and this means having a single graph G.

--sergio

>
>> It doesn't make any difference with RDF and RDF(S), but with OWL for
>> example you don't get a true conjunctive query language.
>>
>> E.g. given the KB [:a rdf:type (Some :r :c)], the query { ?x :r _:b }
>> has no answers.
>
> It seems to me that SELECT ?x {?x :r ?b} should have an answer just  
> when SELECT ?x {?x :r _:b} does, since we allow bnodes as binding  
> values to variables. So this query should have an answer, which  
> would simply utilize a binding of _:b to another bnodeID  
> (conceptually, in the 'OWL-RDF comprehension closure' of the KB).
>
> This is exactly why I am worried about imposing condition (1) on  
> OWL entailment, which I tried to explain in the telecon. Seems to  
> me that this strict condition (1) (which isn't, to repeat,  
> logically necessary, but is there only to keep redundancy to a  
> minimum) is not going to be reasonable past RDFS, since OWL/RDF has  
> built-in semantic conditions stated as comprehension principles  
> which effectively assert the existence of an infinite amount of OWL  
> stuff which is not mentioned explicitly in, but logically required  
> to exist by, an OWL/RDF graph. These include such things as the  
> lists used to encode OWL syntax, and also in this case, the :r  
> value from :a that must exist in order that the 'Some' be  
> satisfied; and then a bnode representing this known-to-exist thing  
> is a legitimate binding for the variable/bnode in the query.   
> Certainly any engine which is capable of inferring that _:b exists  
> would surely be capable of generating an appropriate bnodeID for  
> that thing and returning that as a binding, if required. But of  
> course, we do not require it for query bnodes since they cannot be  
> selected.
>
> Once we have semantic conditions in the language which guarantee  
> that things exist which are not mentioned by name in a KB, we will  
> need to relax condition (1) to allow these 'necessary' existents to  
> be referred to in answer bindings. The case seems to me to be  
> exactly similar to the case of rdf: and rdfs: IRIs in axiomatic  
> triples in RDF and RDFS, and even more similar to function symbols  
> in FOL, where answers bindings can be terms in the Herbrand  
> universe generated from skolem functions. If one forbids functions,  
> the same condition becomes a requirement like this on the  
> generation of existential 'names' in relational argument positions.  
> All this amounts to in practice is that we allow KBs to have a  
> 'stock' of spare bnodeIDs they can use as answer bindings to denote  
> things known to exist, and that are considered to occur in the  
> dataset.
>
> I agree this whole topic is potentially debateable, but it seems to  
> me that we can (must?) leave any debates about exactly how to  
> handle OWL queries to a later date. I have no confidence that ANY  
> of our proposed definitions will work EXACTLY as stated for OWL  
> queries without some changes, if only minor ones. OWL querying is  
> still an open issue. In the meantime, let us try to keep the  
> definitions as simple and clear and robust as we can. I am still  
> very nervous of relying on the 'ordered merge' trick to keep bnode  
> scopes aligned properly: it is too much like an algorithmic  
> specification to be robust enough to seem comfortable.
>
>> Moreover, the algebra definition gets more complicate since you  
>> need to
>> project over the true variables all the times.
>
> True, but this can be easily handled by a suitable definition of  
> 'variable binding' or some such which contains the projection, if  
> required.
>
> Pat
>
> -- 
> ---------------------------------------------------------------------
> IHMC		(850)434 8903 or (650)494 3973   home
> 40 South Alcaniz St.	(850)202 4416   office
> Pensacola			(850)202 4440   fax
> FL 32502			(850)291 0667    cell
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>

Sergio Tessaris, Assistant Professor        Faculty of Computer Science
e-mail: tessaris@inf.unibz.it          Free University of Bozen-Bolzano
http://www.inf.unibz.it/~tessaris                   Piazza Domenicani 3
tel: +39 0471 016 125 (fax 009)                    39100 Bolzano, Italy

Received on Thursday, 12 January 2006 14:17:50 UTC