Re: [RIF] Extensible Design

Hello,

This is written in reaction to my being "invited" by the RIF co-chair
CSMA to share some thoughts on the "Extensible Design" design document
submitted to the RIF WG (see
http://lists.w3.org/Archives/Public/public-rif-wg/2006Apr/0068.html).

As far as I understand it, this document proposes a sketch of a process
for defining a means to achieve interchange (and interoperability as a
consequence) between diverse rule-based idioms. It recognizes that a RIF
language should be span a family of languages sharing a substantial
amount of syntactic and semantic concepts, and provide a means to be
extensible.

The assumption made by this makes sense to me insofar as rule-based
systems officially abiding by a RIF standard would at least guarantee
that the parsers of their own idiosyncratic concrete syntax build
Abstract Syntax Trees (ASTs) serializable in XML form using the
published XML schemas defining the RIF standard. Such RIF-standardized
serialization could then be used as the accepted canonical form of
XML-representation for ASTs of rule-based programs or expressions,
easily digestible by any RIF-abiding systems that would have the means
of interpreting the constructs.

For example, imagine that System A is, say, Alain Colmerauer's Prolog-IV
and System B is SICSTUS Prolog. The two idioms have very different
concrete syntaxes but share a substantial part of their semantics
(Herbrand terms, unification, Prolog's depth-first left-right
resolution, and even some constraint solving such as alldiff etc.,
...). Let us pretend that both System A and System B are officially
RIF-abiding. Then they should be able to spit out an XML serialization
of their parsed program's ASTs using faithfully the RIF standard of
annotation. Then, it is a simple matter for anyone using System A to
swallow RIF-serialized System B and do something with it, or part of it.
The same could be said for System A being ILOG's JRules, say, and System
B being Fair Isaac's Blaze Advisor.

I basically like the main lines of the Boley et al.'s approach for the
following reasons.

1. It starts from a relatively modest and arguably consensual (at least
    within this RIF WG) - that of a representation for rule's conditions.
    It makes sense to start with this and proceed incrementally from
    there to a more complete collection of languages.

2. It uses a formal linguistic approach - which is the natural (and
    right) thing to do if we purport to describe families of formal
    languages.

3. It can lead to a natural language classification scheme (such as the
    one proposed in the RIF-RAF - see
    http://www.w3.org/2005/rules/wg/wiki/Rulesystem_Arrangement_Framework).
    Such a classification scheme can be formalized as using Formal
    Concept Analysis (see below).

4. It offers a incremental layered process for extending whatever we can
    successfully represent.

Anyways, after reading the proposal draft, I used the proposed Rule
Condition Language BNF and built a quick Java application using Jacc
(Just Another Compiler Compiler), a tool that I implemented at SFU and
ILOG (it is now ILOG's property).  Jacc has the pleasing feature to
allow automatic XML serialization from the AST (besides generating a
working parser and hyperlinked HTML documentation) based on a yacc-like
grammar. This shows that RIF-abiding languages implementing their
parsers in Jacc could inherit automatically the XML-serialized
RIF-representation. To give an idea, I produced a full parser for the
baby Rule Condition Language (RCL) that simply generated its
XML-serialization. See attached RCLDoc.zip file (unzip and open file
doc.html) for details.  Comments are welcome. The exercise shows that
one can come a long way for free with formal grammars ... :-)

In conclusion, I think that an RCL-like proposal (and extensions) is a
viable concrete way to proceed for defining a RIF modulo agreeing on a
standard XML schema (à la RuleML, OWL, RDF ...).

-hak

Appendix:

Ganter/Wille's Formal Concept Analysis

    I still marvel at FCA's basic idea's simplicity, elegance, and
    effectiveness. This methodology ought to be more widely known and
    used for automatic ontology extraction.  For instance, for the W3C
    Semantic Web, ontology representation languages (such as OWL, etc,
    ...) must all start with some form of well-defined ontolgy before
    being of any use. The RIF WG's objective is to define a Rule
    Interchange Format.  Whatever "rule", "interchange", and "format"
    mean is yet to be finalized for there to be a clear consensus,
    especially in such a large Working Group. One problem we are facing
    today is how to classify the collection of rule systems known and
    advocated by the WG members into forming an ontology of the systems
    along a set of features and attributes along several dimensions -
    semantic, syntatic, pragmatic, etc... !  See for example the "Swiss
    Knife" example in [Bernhard Ganter and Rudolf Wille, "Conceptual
    Scaling", in Fred Roberts, Ed., Applications of Combinatorics and
    Graph Theory to the Biological and Social Sciences, pp. 139--167,
    Springer Verlag, 1989]. See also the PDF attachments for a simpler
    example.

    Thus, as the FCA Conceptual Scaling (CS) method preconizes, we could
    follow a bottom up approach for deriving a Rule System Ontology.
    Starting with a set of objects (the rule systems) obtained from from
    all the WG member describing their own known systems, we could derive
    the set of relevant attributes per dimension (i.e., the union of all
    those attributes described for each system for this dimension).
    Then, thanks to CS, it would be a simple matter to form the boolean
    (or perhaps even similariry measure) matrix (or hyper-matrix for
    higher dimensions) classifying Systems vs. Features from which we may
    automatically obtain a faithful and conservative ontology of rule
    systems where the lattice of classes if formed by union and
    intersection of their attributes.

    Note that the RIF WG preferred the a top down approach where the set
    of attributes is potentially quite large and irrelevant to most
    actual systems. Such an ontology has a harder time emerging since it
    requires a global view of all systems for anyone to decide what the
    relevant dimensions and attributes are a priori. Further, this often
    leads to confusion of attributes dimension (e.g., semantic vs
    syntactic vs pragmatic) or irrelevant attributes.

    Be that as it may, there are systems usable in a friendly interactive
    graphical environment for users to develop and visualize
    multi-dimensional ontologies. I have not used any, but I, for one,
    would be most interested in using one (see, e.g., the Toscana System
    based on the Ganter/Wille method using Formal Concept Analysis for
    building and visualizing concept lattices from attributed objects:
    http://gdea.informatik.uni-koeln.de/archive/00000166/).

    I suggest that we get our inspiration from the Ganter/Wille
    methodology starting from the RIF-RAF classification scheme to derive
    an adequate RIF representation along the line of what is proposed in
    http://lists.w3.org/Archives/Public/public-rif-wg/2006Apr/0068.html.

-- 
Hassan Aït-Kaci
ILOG, Inc. - Product Division R&D
tel/fax: +1 (604) 930-5603 - email: hak @ ilog . com

Received on Thursday, 4 May 2006 10:40:05 UTC