review of RIF BLD document (action03)

		Comments on
		RIF Basic Logic Dialect
 		(undated version, downloaded 15 October 2007)

		Peter F. Patel-Schneider


These comments so far are only my own.  I am sending them to the OWL WG
in fulfillment of action03.  I have tried to make these comments
suitable as the basis of a comment by the WG as a whole.  To provide
context for my comments I have had to produce a short description of
much of the RIF Basic Logic Dialect, which forms sort of a comment in
its own right.

These comments touch only on substantive issues.  I have another set of
editorial comments, which I will be sending to the RIF WG directly.

* General Notes

The RIF is designed to be a method for exchanging rules or rule-like
information between different systems.   The RIF Basic Logic Dialect
(BLD) is the logic language that is to form part of this exchange
method.  Part of this language is the condition sublanguage, which is to
"provide a common basis for all the dialects of RIF".

The syntax used in these comments is the presentation syntax, except
that for reasons of space I have replaced "rif:iri" with "iri" in
examples.  There is a fully-striped XML syntax as well.

* Syntax of the RIF Condition Sublanguage

The full RIF Condition Sublanguage is a polymorphic, polyadic
logical language with a higher-order semantics (i.e., it allows
variables to occur as predicates and functions in formulae) but with
first-order semantics (i.e., quantification of predicate and function
variables is not over all possible relations and mappings).  There is a
special non-logical symbol (=) for equality.

Conditions can be like 
  Exists ?x  ( p^^iri ( a^^iri ) ( p^^iri ( ?x , p^^iri ) ,
  	       	      	       	   p^^iri ( a^^iri ( ), ?y , ?z ) )
               Or
 	       ?x ( b^^iri ) 
 	       Or
 	       ( a^^iri ( 1.2^^xsd:decimal ( a^^iri ) -> ?z , ?w -> ?x )
 	       	 And
		 p^^iri ( ?x ) ## p^iri [ q^^iri -> ( p^^iri # p^^iri )]
 	       ) )
where non-logical symbols starting with ? are variables
and the others are constant terms.  

The first and second atomic formulae are normal
syntactically-higher-order atomic formulae, the third atomic formula is
a syntactically-higher-order atomic formula with named (slotted)
arguments, the forth atomic formula is a frame formula containing a
subclass formula and a membership formula.

Syntactic well-formedness is governed by a set of
signatures which provide syntactic typing for variables and constant
terms.  A term (or formula) is well-formed if its components are
well-formed and its predicate (or function, respectively) component can
have the number and types of arguments that it has in the term (or
formula).  This construction methodology, however, does not cover the
constructs involving #, ##, and [...] which are well-formed if their
components are well-formed.

The actual RIF BLD Condition Sublanguage "carefully selects signatures
so that the corresponding logic will be first order" and monomorphic and
mono-adic.  Conditions here can be like
  Exists ?x  ( p^^iri ( q^^iri ( ?x , a^^iri ) ,
  	       	      	r^^iri ( a^^iri, ?y , ?z ) )
 	       Or
 	       ( 1.2^^xsd:decimal ( r^^iri -> ?z , ?w -> ?x )
 	       	 And
 		 a^^iri ## b^iri [ f^^iri (a^^iri)  -> ( a^^iri # a^^iri ) ] 
	       ) )
Here the first atomic formula is a normal formula, the second is a
slotted formula, and the third is a frame formula containing a subclass
and a membership formula.  Argument (slot) names in slotted formulae
cannot be function applications or variables, but slot names in frame
formulae ca be. There is also a distinction between normal predicate
applications and built-in predicate applications, but this is only
important in the rule language.

** Comments on the Syntax

Why is the complex signature mechanism in this document, as the only
used language doesn't really need it?  Further, the signature mechanism
cannot capture the syntax of the BLD condition sublanguage, so it is not
even adequate for distinguishing between dialects.

Why are there are three different kinds of atomic formulae (regular,
slotted, and frame)?  This could cause problems with OWL integration, as
it is not obvious which kind of formulae should be used for integration
with OWL.

* Semantics of the RIF BLD Condition Sublanguage

The semantics is only for the RIF BLD Condition Sublanguage, not the
full RIF Condition Sublanguage.

The semantics is a model-theoretic semantics, based on a
partially-ordered set of truth values.  The set of truth values for the
RIF BLD Condition Sublanguage contains true and false, with false less
than true.

Interpretations have a domain and nine mappings,  
1/ IC from constant terms to domain elements, providing meanings for
   constant terms used as individual symbols, 
2/ IV from variables to domain elements, providing meaning for
   variables; 
3/ IF from constant terms to functions from D* to D, providing meaning
   for constant terms used as function symbols;
4/ IR from constant terms to *partial* mappings from D* to truth
   values, providing meaning for constant terms used as predicate
   symbols, that maps = to the identity predicate;
5/ ISF from constant terms to (total? partial?) mappings that map a bag
   of pairs from D to D, providing meaning for slotted terms;
6/ ISR from constant terms to (total? partial?) mappings that map a bag
   of pairs from D to a truth value, providing meaning for slotted
   predicate terms;
7/ Islot from D to functions from DxD to truth values, providing meaning
   for frame (not slotted!) terms,  a slotted term is essentially a
   conjunction of is various parts;
8/ Isub from D x D to truth values, providing meaning for subclass, a
   transitive relationship;
9/ Iisa from D x D to truth values, providing meaning for membership,
   that distributes over ##.

A combination interpretation function, I, is defined to give meaning to
terms.
   I(c) = IC(c) for constant terms used as individual symbols
   I(v) = IV(v) for variables
   I(f(t1,...,tn)) = IF(f) ( I(t1),...,I(tn) ) 
   I(f(a1->v1,...,an->vn)) 
		is ISF(f) applied to the n-element bag 
 		   containing each of the pairs < I(ai),I(vi) >
A combination interpretations, Itruth, is defined to give meaning to
atomic formulae.
   Itruth(P(t1,...,tn)) = IR(P) ( I(t1),...,I(tn) ) 
   Itruth(P(a1->v1,...,an->vn)) 
		is ISP(P) applied to the n-element bag 
 		   containing each of the pairs < I(ai),I(vi) >
   Itruth(f[p->v]) = Islot(I(p))(I(p),I(v))
   Itruth(f[p1->v1 ... pn->vn]) is the conjunction of Itruth(f[pi->vi])
   Itruth(t1 ## t2) = Isub(I(t1),I(t2))
   Itruth(t1 # t2)  = Iisa(I(t1),I(t2))

Note that as the RIF BLD Condition Sublanguage is monomorphic and
mono-adic, IC will not be used for any constant term that is typed as
a function symbol or a predicate symbol.  As well, if a constant term is
typed as an n-ary function symbol (predicate symbol) only the n-ary part
of its IF (IR, respectively) mapping will be used.

Note that there is no interpretation given for frame formulae that
contain subclass or membership formulae.  These are instead pre-expanded
into conjunctions of the frame formula with the subclass or membership
formula replaced by its left-hand side and the subclass or mem(e.g.,
"1.2"^^xsd:decimal) must be interpreted according to the rules
for their symbol spaces (here xsd:decimal).  Ill-typed constant terms
(e.g., "abc"^^xsd:decimal) are not allowed.  The known symbol spaces are
xsd:long, xsd:string, xsd:integer, xsd:decimal, xsd:time, xsd:dateTime,
rdf:XMLLiteral, rif:text, rif:iri, and rif:local.  Constant terms that
belong to rif:text are text strings with language tags.  Constant terms
that belong to rif:iri have a common denotation "regardless of the
context in which that constant occurs".  Constant terms that belong to
rif:local in one rule set "are viewed as unrelated distinct constants"
from those in another rule set.

* Comments on the Semantics 

The mappings for predicates are partial.  It seems to me that this means
that the truth value of some formulae are thus undefined, but no account
is taken of this in the later development of the semantics.

Why is a new treatment of data values needed?  Why does the set of known
data types not include XSD data types like xsd:short?

Why does there need to be a symbol space for IRI identifiers?  This may
cause problems with OWL integration.

The treatment of slotted formulae is a unusual in that the predicates
have a direct map to their extension but the slot names are first
mapped into the domain.  This means that a=b implies that f[a->3] is
equivalent to f[b->3].

* General Comments on the Condition Language

The language is very complex.  It appears to have been designed to
mirror several other languages.  In particular, the frame formulae
appear to have been designed to mirror F-logic.   

The logic is not like RDF, as it is monomorphic and predicates are not
first mapped into domain elements.   The frame part of the logic is not
like regular frames, as the slot names are first mapped into domain
elements.

* RIF Rule Language

RIF rules are horn rules made up of an optional RIF BLD condition and a
RIF BLD atomic formulae that cannot be a built-in predicate application.
Rules are treated like implications in the semantics.  A RIF rule set is
a set of RIF rules.

* RIF-RDF Compatability

RIF is combined with RDF D-entailment where D is the set of
datatype-like symbol spaces known in RIF.  

RDF graphs are translated into RIF-space by replacing URI references by
the obvious constant term in the rif:iri symbol space.  Plain literals
without language tags are replaced by constant terms in the xsd:string
symbol space.   Plain literals with language tags are replaced by
constant terms in the rif:text symbol space.  Ill-typed literals are
replaced by an IRI constructed from the literal.

RIF-RDF combinations are given paired interpretations such that 
1/ the RIF domain is a superset of the set of RDF resources;
2/ RDF properties are a superset of the elements of the RIF domain that
   are the slot in a true slot relationship;
3/ the RIF domain is the union of the set of RDF resources and the set
   of RDF properties;  (Yes, this implies condition 1/.)
4/ the set of RDF literals includes all the values in the known
   datatypes; (Yes, this is already a condition for RDF
   interpretations.) 
5/ the RDF (property) extension of a domain element is the set of pairs
   that are true for the Islot of that domain element;  
6/ the extension of an RDF URI reference (U) is the same as the RIF
   extension of its translation into RIF ("U"^^rif:iri);
7/ well-typed RDF typed literals map the same as the same RIF typed
   literal;  (Yes, this is true for datatyped interpretations.)
8/ ill-typed RDF typed literals map the same as their RIF translation.

Condition 5 makes the connection between the RDF triple s p o . and the
RIF frame atomic formula s [ p -> o ].  

The rest of the semantics is obvious.  Basically the RIF interpretation
has to be a RIF model of the RIF rule set and the RDF interpretation has
to be a simple, RDF, RDFS, or datatype model of the RDF graph.

* Notes on RIF-RDF compatability

Why worry about interpretations where IP is not a subset of IR?  This
only happens in simple entailment.  As there are already datatypes in
RIF why not just go to datatype-entailment?

The treatment of ill-typed literals appears to allow accidental capture
if the replacement IRI also occurs in the RDF graph.  For example,
   "abc"^^xsd:decimal ex:a ex:b .
RIF-RDF entails
   http://www.w3.org/2005/rif/rdf-ill-typed-literal/uri-encode("abc"^^xsd:decimal)
   ex:a ex:b .

Note that rdf:type is not related to membership formulae (i#c) and
rdfs:subClassOf is not related to subclass formulae (c1##c2).

* RIF-OWL Compatability

There was a section on RIF-OWL compatability in an earlier draft of the
document but it has been removed.

* Notes on RIF-OWL Compatability.

There is a question as to which part of the syntax OWL should map to.
There is also a question as to whether OWL syntax should map to RIF
facts. 

As OWL 1.1 is defined on top of OWL DL, it seems that the best map is
from OWL 1.1 descriptions to normal unary predicates, which would result
in not mapping the syntax to facts.

Received on Tuesday, 16 October 2007 14:32:45 UTC