RIF Extensible Design                                   Draft 2006-04-23
~~~~~~~~~~~~~~~~~~~~~

Harold Boley (NRC), Michael Kifer (Stony Brook University),
Jeff Pan (University of Aberdeen), Gerd Wagner (BTU Cottbus),
Alex Kozlenkov (Betfair), Jos de Bruijn (DERI),
Mike Dean (BBN Technologies), Giorgos Stamou (NTUA)



Summary

This proposal starts with a language of conditions that can be
used for positive rule bodies and then extends it with negative
literals, builtins, and constructs to query OWL and RDF (via SPARQL).
The rationale for focusing on this sublanguage is that it can be
shared among five different sublanguages of interest to RIF.
We then discuss various extensions of this sublanguage and
propose an interoperation/interchange evaluation.



A. RIF Condition Language

This proposal develops a set of fundamental concepts shared by the rule
languages of interest to the RIF WG as outlined in the Design Roadmap
(http://lists.w3.org/Archives/Public/public-rif-wg/2006Feb/0255.html).
Here we focus on the one part that is shared by Logic Programming rules,
Production (Condition-Action) rules, Reactive (Event-Condition-Action)
rules, Normative rules (Integrity Constraints), and queries. We call
this part 'Conditions' and the proposed sublanguage 'RIF Condition
Language' (as a working title).

The RIF Condition Language is common ground for specifying syntactic
fragments in several different dialects of RIF:

- Rule bodies in a declarative logic programming dialect (LP)
- Rule bodies in a first-order dialect (FO)
- Conditions in the bodies in production rules dialects (PR)
- The non-action part of the rule bodies in reactive rules dialects (RR)
- Integrity constraints (IC)
- Queries in LP dialects (QY)

Note that for rules this sublanguage is intended to be used only in the
bodies, not their heads. The various RIF dialects diverge in the way
they specify rule heads and other parts of their rules.  We believe that
by focusing on the condition part of the rule bodies we can achieve
maximum syntactic (and some semantic) reuse among RIF dialects.

Since different semantics are possible for syntactically identical rule
sets, we propose that the semantics of a RIF rule set be specified by a
predefined attribute. We will elaborate on this proposal separately, but
the idea is to organize the most important known semantic and syntactic
features into a taxonomy (which could be extended to accommodate new
semantics/syntax), and the values of the aforementioned attribute would
come from that taxonomy.


A.1 Basis: Positive Conditions

The basis of the language is formed by what can appear in the bodies of
Horn-like rules with equality -- conjunctions and disjunctions of atomic
formulas and equations (such rules reduce to pure Horn). We later extend
our proposal to include builtins.  As mentioned in the introduction, the
motivation is that this sublanguage can be shared among the bodies of
the rules expressed in the following RIF dialects:

- FO (first order)
- LP (logic programming)
- PR (production rules)
- RR (reactive rules)

This sublanguage can also be used to uniformly express:

- IC (integrity constraints)
- QY (queries)


SYNTAX
------

Essential BNF for human-readable syntax:

  Data     ::= value
  Ind      ::= object
  Var      ::= '?' name
  TERM     ::= Data | Ind | Var | Expr
  Expr     ::= Fun '(' TERM* ')'
  Atom     ::= Rel '(' TERM* ')' | TERM '=' TERM
  LITFORM  ::= Atom
  QUANTIF  ::= 'Exists' Var+ '(' CONDIT ')'
  CONJ     ::= 'And' '(' CONDIT* ')'
  DISJ     ::= 'Or' '(' CONDIT* ')'
  CONDIT   ::= LITFORM | QUANTIF | CONJ | DISJ

Here LITFORM stands for Literal Formula and anticipates the introduction
of negated atoms later on. QUANTIF stands for Quantified Formula, which
for Horn-like conditions can only be 'Exists' Formulas (Var+ variables
should occur free in the scoped CONDIT, so 'Exists' can quantify them;
free variables are discussed below). More explicitly than in logic
programming, CONJ expresses formula conjunctions, and DISJ expresses
disjunctions. Finally, CONDIT combines everything and defines RIF
conditions, which can later be extended beyond LITFORM, QUANTIF, CONJ,
or DISJ.

We assume that all constants (Data and Ind) belong to two different
logical sorts: the sort of values and the sort of objects. This means
that every constant carries with it the designation of the sort to which
it belongs (indicating whether the constant is a value or an object).
Values (Data) can also optionally carry a designation of their datatype
(e.g., TIME, Integer).

Note that there are two uses of variables in the RIF condition language:
free and quantified. All quantified variables are quantified explicitly,
existentially (and also universally, later). We adopt the usual scoping
rules for quantification from first-order logic. Variables that are not
explicitly quantified are free.

The free variables are needed because we are dealing with conditions
that occur in rule bodies only. When a condition occurs in such a rule
body, the free variables in the condition are precisely those that
also occur in the rule head. Such variables are quantified universally
outside of the rule, and the scope of such quantification is the entire
rule. For instance, the variable ?X in the rule below is free in the
condition that occurs in the rule body, but it is universally quantified
outside of the rule.

Condition with a free variable ?X:
                             ... Exists ?Y (condition(..?X..?Y..)) ...

Rule using the condition in its body:
Forall ?X (head(...?X...) :- ... Exists ?Y (condition(..?X..?Y..)) ...)

When conditions are used as queries, their free variables are to be
bound to carry the answer bindings back to the caller.

The semantics of conditions that contain free variables is defined in
the section SEMANTICS.


Example 1 (A RIF condition in human-readable syntax):

  In this condition, ?Buyer is quantified existentially, while ?Seller
  and ?Author are free:

  And ( Exists ?Buyer (purchase(?Buyer ?Seller book(?Author LeRif) $49))
        ?Seller=?Author )

  This syntax is similar in style, and compatible to, the OWL Abstract
  Syntax (http://www.w3.org/TR/owl-semantics/syntax.html). 


An XML syntax can be obtained from the above BNF as follows.  The
non-terminals in all-upercase such as CONDIT become XML entities, which
act like macros and will not be visible in instance markups.  The other
non-terminals and symbols ('=', 'Exists', etc.) become XML elements,
which are adapted from RuleML as shown below. Optional attributes are
added for finer distinctions. For example, an optional attribute in the
Data element can point to an XML Schema Datatype.

- Data (value constant, including optional attribute for a datatype IRI)
- Ind (object constant; empty element with attribute, iri, for oid; e.g.
       global: <Ind iri="http://www.w3.org"/>, local: <Ind iri="#foo"/>)
- Var (logic variable; optional type attribute, e.g. RDFS/OWL-class IRI) 
- Fun (n-ary function symbol; optional attribute designating interpreted
       [a.k.a. equation-defined] functions in contrast to uninterpreted
       [a.k.a. free] functions)
- Rel    (n-ary relation symbol [a.k.a. predicate])
- Expr   (expression formula)
- Atom   (atomic formula; optional attribute for external atoms)
- Equal  (prefix version of term equation '=')
- Exists (quantified formula for 'Exists')
- And    (conjunction; optional attribute for sequential conjunctions)
- Or     (disjunction; optional attribute for sequential disjunctions)

Based on the FOL RuleML (http://www.w3.org/Submission/FOL-RuleML)
experience, this could be directly rewritten as a DTD or an XML Schema.


Example 2 (A RIF condition in XML syntax):

  The condition formula in Example 1 can be serialized in XML as shown:

  <And>
    <Exists>
      <Var>Buyer</Var>
      <Atom>
        <Rel>purchase</Rel>
        <Var>Buyer</Var>
        <Var>Seller</Var>
        <Expr>
          <Fun>book</Fun>
          <Var>Author</Var>
          <Ind>LeRif</Ind>
        </Expr>
        <Data>$49</Data>
      </Atom>
    </Exists>
    <Equal>
      <Var>Seller</Var>
      <Var>Author</Var>
    </Equal>
  </And>


Note that the distinction between the sort of values and the sort of
objects can be extended from Data vs. Ind to variables (Var) by using
an optional attribute.  This corresponds to the so-called d-variables
and i-variables in SWRL (http://www.w3.org/Submission/SWRL).  Sorts can
also be optionally added to function symbols and predicates in order to
support sorted languages.


SEMANTICS
---------

The semantics of a ruleset R is defined by a collection of intended
models.  In case of FOL, all models of R are intended. In LP, only some
models (e.g., well-founded or stable models of R) are intended.

However, here we are talking only about condition parts of the rules.
So, by semantics we mean the notion of satisfaction of a formula in the
interpretations of the various RIF dialects.  For example, in FO, all
first-order interpretations are appropriate. In LP, infinite Herbrand
models are typically used. In LP with the well-founded semantics,
3-valued Herbrand models are used. Stable model semantics uses only
2-valued interpretations.

The notion of satisfaction in a model is the same for all dialects of
RIF, and in the 2-valued case does not depend on the semantic tags
associated with the rulesets in which our conditions occur.
(Well-founded interpretations, being 3-valued, are slightly different.)

Given a condition formula phi(X1,...,Xn) with free variables X1, ..., Xn
and an interpretation M, define M(phi(X1,...,Xn)) as the set of all
bindings (a1/X1,...,an/Xn) such that M |= phi(a1,...,an).

For instance, denoting the formula of Example 1 as phi(?Seller,?Author),
if we use the following Herbrand interpretation:

    M = {purchase(Mary John book(Fred LeRif) $49),
	 purchase(Nina Fred book(Fred LeRif) $49)
	 purchase(Alice Floob book(Floob LeRif) $49)}
	 
then M(phi(?Seller,?Author)) is:

     {(Fred/?Seller,Fred/?Author),
      (Floob/?Seller,Floob/?Author)}

The notion of M(phi(X1,...,Xn)) is independent of the semantics of the
ruleset to which phi(X1,...,Xn) is posed, and it "plugs into" the
model-theoretic semantics of the rulesets under FO, LP, and the
operational semantics of PR.

Note that integrity constraints (which are closed formulas) are simply
checked against the intended models of the ruleset.  Therefore, the
semantics of integrity constraints is fully defined by the above notion
of satisfaction in the models of rulesets. Thus, these constraints can
be full first-order formulas even in case of LP rulesets. The same can
be said about queries.

While semantics can be defined for very general classes of constraints
and queries, this does not mean that the problem of constraint
satisfaction/query answering is decidable or has efficient evaluation
techniques for such general classes.


A.2 Extension: Negative Conditions

This extension introduces two kinds of negation (Naf and Neg) to the RIF
Condition Language, initially proposed for Phase 1 queries.  Neg denotes
classical negation as defined in first-order logic; special uses of it
can be shared by FO and all dialects that support a form of classical
negation (e.g., certain dialects of LP; perhaps some PR dialects).

Naf denotes negation as failure in one of its incarnations (well-founded
or stable-model). The actual flavor of Naf is determined by inspecting
the value of a semantic tag associated with the ruleset.

Naf is used in LP (and in queries and constraints over the intended
models of LP); it can possibly be relevant to PR and RR.

Alongside negation we can introduce explicit universal quantification.


SYNTAX
------

This extension introduces two negation symbols over atoms, and one legal
nesting, for which we propose to use the notation Naf/Neg, as in RuleML:

  LITFORM ::= Atom | 'Neg' Atom | 'Naf' Atom | 'Naf' 'Neg' Atom

Note that in Phase 1 we are proposing to use negation only in queries.
Phase 1 rules will all be Horn and their bodies will not have negation.
Phase 2 will extend RIF to allow negation in rules as well.

The next step could be to allow universal quantification:

  QUANTIF ::= 'Exists' Var+ '(' CONDIT ')' |
              'Forall' Var+ '(' CONDIT ')'


SEMANTICS
---------

The semantics of atomic Neg in the models of the FO dialect is standard.
For LP dialects supporting "classical" negation, Neg-negated predicates
are interpreted by fresh predicates that are appropriately axiomatized.
For instance, Neg of p is associated with a fresh predicate, Neg_p, and
Neg p(...) is said to be true in a model iff Neg_p(...) is. It is also
required that for any given tuple of ground arguments, the formulas
p(...) and Neg_p(...) cannot be true in the same model.

The following (Phase 2) dialects of RIF will support Neg and/or Naf:

- LP/WF (the well-founded semantics) generally supports only Naf, but
  some variants (e.g., Courteous LP) also support Neg.
- LP/SMS (stable model semantics) supports both Neg and Naf
  (e.g., ERDF).
- FO supports Neg only.

Queries and integrity constraints can use those forms of negation that
are supported by the rulesets that they constrain or query.

To distinguish the different possible semantics for the same ruleset,
RIF will define an attribute that will explicitly specify the semantics
under which the corresponding ruleset is to be interpreted.


A.3 Extension: Builtins in Conditions

A builtin is a relation or function for which there is a single, fixed
interpretation. For instance, the relation "<" and the function "+" have
fixed and unique interpretations over the domain of numbers.  Although
builtin functions are not strictly needed, as they can be represented by
builtin relations, a practical language needs both datatype relations
and datatype functions. (The equality predicate can be used to bind the
result of a builtin function call to a logical variable.)

Satisfaction of an atomic expression that is based on a builtin is done
with respect to the concrete domain with which this builtin is
associated. Some builtins may be polymorphic and have several associated
domains. For instance, in the domain of strings, "<" can be viewed as
lexicographics ordering and "+" as string concatenation.
In that case, the proper interpretation of a builtin is determined by
the arguments of the builtin predicate or function.

RIF can support different sets of builtins. For instance, each builtin
can be identified by an IRI, in which case support for SWRL, SPARQL, and
XQuery builtins is feasible.

Note that, in general, satisfiability of a conjunction of builtin
predicates may be undecidable. In RIF we should probably strive to allow
only those sets of builtins for which satisfiability of conjunctive
formulas is decidable.


A.4 Extension: Resource Conditions

A special external atom can be used to support SPARQL queries to RDF.
RDF facts can be represented via a dedicated ternary predicate, and
bNodes can be represented using existential variables.


A.5 Extension: Ontology Conditions

A special external atom or an entire conjunction of such atoms can be
used to achieve hybrid rule-ontology integration, allowing a query-like
interface to OWL classes (including their subClassOf subsumption) and
properties (including subPropertyOf subsumption). Optional typing of RIF
variables (e.g., for a Sorted Horn Logic) can be regarded as the special
case of such an interface for OWL-Lite classes. (Predefined classes,
such as Integer, can be borrowed from XML Part 2, Datatypes.)

The semantics of such integrations are different for RIF LP, FO, and PR.

In case of FO, there is no semantic mismatch and the overall semantics
is first-order. In case of LP, recent work by Rosati (DL+Log, KR2006)
provides a semantics for a very general class of rulesets that are
tightly integrated with DL-based ontologies. However, even more general,
ad hoc interoperability between rules and ontologies might be needed.
For PR and RR, interoperability with OWL is likely not to have
model-theoretic semantics as it will be ad hoc.



B. Extension: RIF Horn Rule Language

This extends Horn conditions A.1-A.5 to Horn rules by
adding positive literals as rule heads (cf. Charter, Phase 1).

For positive conjunctive queries, FO and LP semantics coincide.
For queries that involve negation or universal quantification the
semantics of FO and LP diverge.



C. Extension: RIF Production and Reaction Rule Languages

This extends PR condition bodies A.1-A.5 to full rules by adding
Production Rule conclusion actions and Reaction Rule events/actions
(cf. Charter, Phase 2).



D. Evaluation: Use Cases

First focus on a subset of the Working Draft Use Cases for A-C
(http://www.w3.org/2005/rules/wg/ucr/draft-20060323.html,
 http://lists.w3.org/Archives/Public/public-rif-comments):

  D.1 Negotiating eBusiness Contracts Across Rule Platforms
  D.2 Negotiating eCommerce Transactions Through Disclosure of Buyer
      and Seller Policies and Preferences
  D.3 Collaborative Policy Development for Dynamic Spectrum Access
  D.4 Access to Business Rules of Supply Chain Partners
  D.5 Managing Inter-Organizational Business Policies and Practices
  D.6 Ruleset Integration for Medical Decision Support
  D.7 Interchanging Rule Extensions to OWL
  D.8 Vocabulary Mapping for Data Integration



E. Evaluation: Interoperation/Interchange

Experiments by RIF Participants on use cases D for languages A-C:

  E.1 SBU <-> NRC
  E.2 SBU <-> NRC <-> DERI
  E.3 BBN <-> SBU <-> NRC <-> DERI
  . . .



F.  Mappings to Other Languages

Provide normative mappings to languages discussed in the Charter
(order TBD).

  F.1  RDF
  F.2  OWL
  F.3  SWRL
  F.4  RuleML
  F.5  OMG PRR
  F.6  SBVR
  F.7  ISO Common Logic
  F.8  SPARQL