finite approximation of the minimal Herbrand model for a RIF Core/BLD ruleset. (was: [TF-ENT] Agenda 24th Feb teleconf) from Axel Polleres on 2010-02-24 (public-rdf-dawg@w3.org from January to March 2010)

From: Axel Polleres <axel.polleres@deri.org>
Date: Wed, 24 Feb 2010 10:25:55 +0000
To: Sandro Hawke <sandro@w3.org>, Birte Glimm <birte.glimm@comlab.ox.ac.uk>, SPARQL Working Group <public-rdf-dawg@w3.org>
Cc: Axel Polleres <axel.polleres@deri.org>
Message-Id: <F0FC6DEE-35AB-4323-A47D-183B081121EF@deri.org>

Dear all,

in order to keep answers finite in SPARQL, we need to define a canonical, finite approximation of all ground consequences of a RIF ruleset.

In order to explain my plan I make some simplifications, as follows: 

* RIF Core = Datalog with built-in functions, with the typical safety condition for Datalog, i.e. variables in the rule head must be bound in the rule body. Note however, that due to the presences of built-in functions this safety condition does not guarantee finiteness of the Herbrand model.
with the following restricted subsets:
 - strongly safe RIF Core: built-ins may only be used in an "acyclic fashion" that is to avoid an infinite minimal Herbrand model

* RIF BLD = Horn rules (with buil-in functions): in addition to RIF Core, RIF BLD does not restrict safety and allows arbitrary use of (logical) function symbols.

* RIF-RDF combination: A RIF RDF combination is, again roughly spoken, the combination of a RIF ruleset with an RDF graph, where for this combination entailment of RIF formulae as well as for RDF graphs is defined. We are probably only interested in the latter... the entailed RDF Graphs for a RIF-RDF combination of ruleset R and grapg G are essentially all graphs G' for which there's a homomorphism from the "RDF-image" of the minimal Herbrand model of (R+tr(G))to G'.
By RDF-image, I mean here mapping all entailed slotted terms S[P->O] back to RDF triples and by tr(G), I mean translating an RDF graph G to a set of RIF facts s.t. any triple  S P O. is represented as S[P->O] (where blank nodes are translated to local symbols in RIF). 


As well known, the semantics of Datalog can be defined via the immediate consequence operator, that is:

Let R be a ruleset with rules of the form 
 Ante => Con
in Datalog (Ante would be a conjunction of atoms, but in RIF it is a general RIF condition that can also have discjunction) and Con is an atomic formula ... safety condition applies for RIF Core as sketched above.

Then, given a set of facts (ground atoms) X and a rule r the immediate consequence operator 
T_r(X) is defined as:

T_r(X) = { mu(Con) |there exists a mapping \mu from the variables in r to ground terms such that mu(Ante) \wedge X is satisfied }

Accordingly, T_R(X) = union_{r in R} T_r(X)

The closure of an RDF graph G w.r.t R can then just be defined as the minimal fixpoint of T_R starting with tr(G),that is let X_0=tr(G) and X_{i+i} = X_i union T_R(X_i).

For strongly safe R there is a minimal n such that T_{n+1} = T_n so this minimal fixpoint can be finitely computed.
However, this is not the case for general RIF Corte or for RIF BLD.

Now here, the idea would be to restrict the monotonic T_R operator in a way, such that not infinite new "values" are computed... I could imagine some kind of restriction by "levels" of computation, or "depth" of function symbols (be they built-in or logical functions).

Let the Herbrand universe HU be just the ground terms occurring in tr{G} union R. then the Herbrand base of level 1 HU is the set of atomic formulae constructible from predicate symbols appearing in tr{G} union R and terms in HU.

We could then define a limited immediate consequence operator that allows only the application of functions/function symbols at a certain depth. For depth 1 that would be: 

T^1_r(X) = { mu(Con) |there exists a mapping \mu from the variables in r to ground terms in HU_1 such that mu(Ante) \wedge X is satisfied }

Obviously, there is always a least fixpoint of T^1 even for BLD and unrestricted RIF Core, and I thnk it is uniquely determined, but we allow only very limited, that is non-recursive computations of new values or introduction of function symbols.

HU_2, HU_3 etc. could be defined similarly, I believe. So one could gradually extend this notion to any constant c > 0.

I haven't thought through all the details, and am almost sure that the likes of Jos and Birte have comments on this, but that'd be my suggested starting point.

Axel

Received on Wednesday, 24 February 2010 10:26:31 UTC