RIF Extensible Design Draft 2006-04-23 ~~~~~~~~~~~~~~~~~~~~~ Harold Boley (NRC), Michael Kifer (Stony Brook University), Jeff Pan (University of Aberdeen), Gerd Wagner (BTU Cottbus), Alex Kozlenkov (Betfair), Jos de Bruijn (DERI), Mike Dean (BBN Technologies), Giorgos Stamou (NTUA) Summary This proposal starts with a language of conditions that can be used for positive rule bodies and then extends it with negative literals, builtins, and constructs to query OWL and RDF (via SPARQL). The rationale for focusing on this sublanguage is that it can be shared among five different sublanguages of interest to RIF. We then discuss various extensions of this sublanguage and propose an interoperation/interchange evaluation. A. RIF Condition Language This proposal develops a set of fundamental concepts shared by the rule languages of interest to the RIF WG as outlined in the Design Roadmap (http://lists.w3.org/Archives/Public/public-rif-wg/2006Feb/0255.html). Here we focus on the one part that is shared by Logic Programming rules, Production (Condition-Action) rules, Reactive (Event-Condition-Action) rules, Normative rules (Integrity Constraints), and queries. We call this part 'Conditions' and the proposed sublanguage 'RIF Condition Language' (as a working title). The RIF Condition Language is common ground for specifying syntactic fragments in several different dialects of RIF: - Rule bodies in a declarative logic programming dialect (LP) - Rule bodies in a first-order dialect (FO) - Conditions in the bodies in production rules dialects (PR) - The non-action part of the rule bodies in reactive rules dialects (RR) - Integrity constraints (IC) - Queries in LP dialects (QY) Note that for rules this sublanguage is intended to be used only in the bodies, not their heads. The various RIF dialects diverge in the way they specify rule heads and other parts of their rules. We believe that by focusing on the condition part of the rule bodies we can achieve maximum syntactic (and some semantic) reuse among RIF dialects. Since different semantics are possible for syntactically identical rule sets, we propose that the semantics of a RIF rule set be specified by a predefined attribute. We will elaborate on this proposal separately, but the idea is to organize the most important known semantic and syntactic features into a taxonomy (which could be extended to accommodate new semantics/syntax), and the values of the aforementioned attribute would come from that taxonomy. A.1 Basis: Positive Conditions The basis of the language is formed by what can appear in the bodies of Horn-like rules with equality -- conjunctions and disjunctions of atomic formulas and equations (such rules reduce to pure Horn). We later extend our proposal to include builtins. As mentioned in the introduction, the motivation is that this sublanguage can be shared among the bodies of the rules expressed in the following RIF dialects: - FO (first order) - LP (logic programming) - PR (production rules) - RR (reactive rules) This sublanguage can also be used to uniformly express: - IC (integrity constraints) - QY (queries) SYNTAX ------ Essential BNF for human-readable syntax: Data ::= value Ind ::= object Var ::= '?' name TERM ::= Data | Ind | Var | Expr Expr ::= Fun '(' TERM* ')' Atom ::= Rel '(' TERM* ')' | TERM '=' TERM LITFORM ::= Atom QUANTIF ::= 'Exists' Var+ '(' CONDIT ')' CONJ ::= 'And' '(' CONDIT* ')' DISJ ::= 'Or' '(' CONDIT* ')' CONDIT ::= LITFORM | QUANTIF | CONJ | DISJ Here LITFORM stands for Literal Formula and anticipates the introduction of negated atoms later on. QUANTIF stands for Quantified Formula, which for Horn-like conditions can only be 'Exists' Formulas (Var+ variables should occur free in the scoped CONDIT, so 'Exists' can quantify them; free variables are discussed below). More explicitly than in logic programming, CONJ expresses formula conjunctions, and DISJ expresses disjunctions. Finally, CONDIT combines everything and defines RIF conditions, which can later be extended beyond LITFORM, QUANTIF, CONJ, or DISJ. We assume that all constants (Data and Ind) belong to two different logical sorts: the sort of values and the sort of objects. This means that every constant carries with it the designation of the sort to which it belongs (indicating whether the constant is a value or an object). Values (Data) can also optionally carry a designation of their datatype (e.g., TIME, Integer). Note that there are two uses of variables in the RIF condition language: free and quantified. All quantified variables are quantified explicitly, existentially (and also universally, later). We adopt the usual scoping rules for quantification from first-order logic. Variables that are not explicitly quantified are free. The free variables are needed because we are dealing with conditions that occur in rule bodies only. When a condition occurs in such a rule body, the free variables in the condition are precisely those that also occur in the rule head. Such variables are quantified universally outside of the rule, and the scope of such quantification is the entire rule. For instance, the variable ?X in the rule below is free in the condition that occurs in the rule body, but it is universally quantified outside of the rule. Condition with a free variable ?X: ... Exists ?Y (condition(..?X..?Y..)) ... Rule using the condition in its body: Forall ?X (head(...?X...) :- ... Exists ?Y (condition(..?X..?Y..)) ...) When conditions are used as queries, their free variables are to be bound to carry the answer bindings back to the caller. The semantics of conditions that contain free variables is defined in the section SEMANTICS. Example 1 (A RIF condition in human-readable syntax): In this condition, ?Buyer is quantified existentially, while ?Seller and ?Author are free: And ( Exists ?Buyer (purchase(?Buyer ?Seller book(?Author LeRif) $49)) ?Seller=?Author ) This syntax is similar in style, and compatible to, the OWL Abstract Syntax (http://www.w3.org/TR/owl-semantics/syntax.html). An XML syntax can be obtained from the above BNF as follows. The non-terminals in all-upercase such as CONDIT become XML entities, which act like macros and will not be visible in instance markups. The other non-terminals and symbols ('=', 'Exists', etc.) become XML elements, which are adapted from RuleML as shown below. Optional attributes are added for finer distinctions. For example, an optional attribute in the Data element can point to an XML Schema Datatype. - Data (value constant, including optional attribute for a datatype IRI) - Ind (object constant; empty element with attribute, iri, for oid; e.g. global: , local: ) - Var (logic variable; optional type attribute, e.g. RDFS/OWL-class IRI) - Fun (n-ary function symbol; optional attribute designating interpreted [a.k.a. equation-defined] functions in contrast to uninterpreted [a.k.a. free] functions) - Rel (n-ary relation symbol [a.k.a. predicate]) - Expr (expression formula) - Atom (atomic formula; optional attribute for external atoms) - Equal (prefix version of term equation '=') - Exists (quantified formula for 'Exists') - And (conjunction; optional attribute for sequential conjunctions) - Or (disjunction; optional attribute for sequential disjunctions) Based on the FOL RuleML (http://www.w3.org/Submission/FOL-RuleML) experience, this could be directly rewritten as a DTD or an XML Schema. Example 2 (A RIF condition in XML syntax): The condition formula in Example 1 can be serialized in XML as shown: Buyer purchase Buyer Seller book Author LeRif $49 Seller Author Note that the distinction between the sort of values and the sort of objects can be extended from Data vs. Ind to variables (Var) by using an optional attribute. This corresponds to the so-called d-variables and i-variables in SWRL (http://www.w3.org/Submission/SWRL). Sorts can also be optionally added to function symbols and predicates in order to support sorted languages. SEMANTICS --------- The semantics of a ruleset R is defined by a collection of intended models. In case of FOL, all models of R are intended. In LP, only some models (e.g., well-founded or stable models of R) are intended. However, here we are talking only about condition parts of the rules. So, by semantics we mean the notion of satisfaction of a formula in the interpretations of the various RIF dialects. For example, in FO, all first-order interpretations are appropriate. In LP, infinite Herbrand models are typically used. In LP with the well-founded semantics, 3-valued Herbrand models are used. Stable model semantics uses only 2-valued interpretations. The notion of satisfaction in a model is the same for all dialects of RIF, and in the 2-valued case does not depend on the semantic tags associated with the rulesets in which our conditions occur. (Well-founded interpretations, being 3-valued, are slightly different.) Given a condition formula phi(X1,...,Xn) with free variables X1, ..., Xn and an interpretation M, define M(phi(X1,...,Xn)) as the set of all bindings (a1/X1,...,an/Xn) such that M |= phi(a1,...,an). For instance, denoting the formula of Example 1 as phi(?Seller,?Author), if we use the following Herbrand interpretation: M = {purchase(Mary John book(Fred LeRif) $49), purchase(Nina Fred book(Fred LeRif) $49) purchase(Alice Floob book(Floob LeRif) $49)} then M(phi(?Seller,?Author)) is: {(Fred/?Seller,Fred/?Author), (Floob/?Seller,Floob/?Author)} The notion of M(phi(X1,...,Xn)) is independent of the semantics of the ruleset to which phi(X1,...,Xn) is posed, and it "plugs into" the model-theoretic semantics of the rulesets under FO, LP, and the operational semantics of PR. Note that integrity constraints (which are closed formulas) are simply checked against the intended models of the ruleset. Therefore, the semantics of integrity constraints is fully defined by the above notion of satisfaction in the models of rulesets. Thus, these constraints can be full first-order formulas even in case of LP rulesets. The same can be said about queries. While semantics can be defined for very general classes of constraints and queries, this does not mean that the problem of constraint satisfaction/query answering is decidable or has efficient evaluation techniques for such general classes. A.2 Extension: Negative Conditions This extension introduces two kinds of negation (Naf and Neg) to the RIF Condition Language, initially proposed for Phase 1 queries. Neg denotes classical negation as defined in first-order logic; special uses of it can be shared by FO and all dialects that support a form of classical negation (e.g., certain dialects of LP; perhaps some PR dialects). Naf denotes negation as failure in one of its incarnations (well-founded or stable-model). The actual flavor of Naf is determined by inspecting the value of a semantic tag associated with the ruleset. Naf is used in LP (and in queries and constraints over the intended models of LP); it can possibly be relevant to PR and RR. Alongside negation we can introduce explicit universal quantification. SYNTAX ------ This extension introduces two negation symbols over atoms, and one legal nesting, for which we propose to use the notation Naf/Neg, as in RuleML: LITFORM ::= Atom | 'Neg' Atom | 'Naf' Atom | 'Naf' 'Neg' Atom Note that in Phase 1 we are proposing to use negation only in queries. Phase 1 rules will all be Horn and their bodies will not have negation. Phase 2 will extend RIF to allow negation in rules as well. The next step could be to allow universal quantification: QUANTIF ::= 'Exists' Var+ '(' CONDIT ')' | 'Forall' Var+ '(' CONDIT ')' SEMANTICS --------- The semantics of atomic Neg in the models of the FO dialect is standard. For LP dialects supporting "classical" negation, Neg-negated predicates are interpreted by fresh predicates that are appropriately axiomatized. For instance, Neg of p is associated with a fresh predicate, Neg_p, and Neg p(...) is said to be true in a model iff Neg_p(...) is. It is also required that for any given tuple of ground arguments, the formulas p(...) and Neg_p(...) cannot be true in the same model. The following (Phase 2) dialects of RIF will support Neg and/or Naf: - LP/WF (the well-founded semantics) generally supports only Naf, but some variants (e.g., Courteous LP) also support Neg. - LP/SMS (stable model semantics) supports both Neg and Naf (e.g., ERDF). - FO supports Neg only. Queries and integrity constraints can use those forms of negation that are supported by the rulesets that they constrain or query. To distinguish the different possible semantics for the same ruleset, RIF will define an attribute that will explicitly specify the semantics under which the corresponding ruleset is to be interpreted. A.3 Extension: Builtins in Conditions A builtin is a relation or function for which there is a single, fixed interpretation. For instance, the relation "<" and the function "+" have fixed and unique interpretations over the domain of numbers. Although builtin functions are not strictly needed, as they can be represented by builtin relations, a practical language needs both datatype relations and datatype functions. (The equality predicate can be used to bind the result of a builtin function call to a logical variable.) Satisfaction of an atomic expression that is based on a builtin is done with respect to the concrete domain with which this builtin is associated. Some builtins may be polymorphic and have several associated domains. For instance, in the domain of strings, "<" can be viewed as lexicographics ordering and "+" as string concatenation. In that case, the proper interpretation of a builtin is determined by the arguments of the builtin predicate or function. RIF can support different sets of builtins. For instance, each builtin can be identified by an IRI, in which case support for SWRL, SPARQL, and XQuery builtins is feasible. Note that, in general, satisfiability of a conjunction of builtin predicates may be undecidable. In RIF we should probably strive to allow only those sets of builtins for which satisfiability of conjunctive formulas is decidable. A.4 Extension: Resource Conditions A special external atom can be used to support SPARQL queries to RDF. RDF facts can be represented via a dedicated ternary predicate, and bNodes can be represented using existential variables. A.5 Extension: Ontology Conditions A special external atom or an entire conjunction of such atoms can be used to achieve hybrid rule-ontology integration, allowing a query-like interface to OWL classes (including their subClassOf subsumption) and properties (including subPropertyOf subsumption). Optional typing of RIF variables (e.g., for a Sorted Horn Logic) can be regarded as the special case of such an interface for OWL-Lite classes. (Predefined classes, such as Integer, can be borrowed from XML Part 2, Datatypes.) The semantics of such integrations are different for RIF LP, FO, and PR. In case of FO, there is no semantic mismatch and the overall semantics is first-order. In case of LP, recent work by Rosati (DL+Log, KR2006) provides a semantics for a very general class of rulesets that are tightly integrated with DL-based ontologies. However, even more general, ad hoc interoperability between rules and ontologies might be needed. For PR and RR, interoperability with OWL is likely not to have model-theoretic semantics as it will be ad hoc. B. Extension: RIF Horn Rule Language This extends Horn conditions A.1-A.5 to Horn rules by adding positive literals as rule heads (cf. Charter, Phase 1). For positive conjunctive queries, FO and LP semantics coincide. For queries that involve negation or universal quantification the semantics of FO and LP diverge. C. Extension: RIF Production and Reaction Rule Languages This extends PR condition bodies A.1-A.5 to full rules by adding Production Rule conclusion actions and Reaction Rule events/actions (cf. Charter, Phase 2). D. Evaluation: Use Cases First focus on a subset of the Working Draft Use Cases for A-C (http://www.w3.org/2005/rules/wg/ucr/draft-20060323.html, http://lists.w3.org/Archives/Public/public-rif-comments): D.1 Negotiating eBusiness Contracts Across Rule Platforms D.2 Negotiating eCommerce Transactions Through Disclosure of Buyer and Seller Policies and Preferences D.3 Collaborative Policy Development for Dynamic Spectrum Access D.4 Access to Business Rules of Supply Chain Partners D.5 Managing Inter-Organizational Business Policies and Practices D.6 Ruleset Integration for Medical Decision Support D.7 Interchanging Rule Extensions to OWL D.8 Vocabulary Mapping for Data Integration E. Evaluation: Interoperation/Interchange Experiments by RIF Participants on use cases D for languages A-C: E.1 SBU <-> NRC E.2 SBU <-> NRC <-> DERI E.3 BBN <-> SBU <-> NRC <-> DERI . . . F. Mappings to Other Languages Provide normative mappings to languages discussed in the Charter (order TBD). F.1 RDF F.2 OWL F.3 SWRL F.4 RuleML F.5 OMG PRR F.6 SBVR F.7 ISO Common Logic F.8 SPARQL