Rules for converting from FOL->DL(OWL)? from Harry Halpin on 2005-01-31 (www-rdf-interest@w3.org from January 2005)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Sun, 30 Jan 2005 23:09:30 -0500 (EST)
To: www-rdf-interest@w3.org
Message-ID: <Pine.LNX.4.61.0501300857480.25398@tribal.metalab.unc.edu>
Everyone,
 	I am working on a converter for some logical facts from my 
own internal XML format to OWL. I want to get this right as this is about 
a million facts, so running the XSL and theorem-prover takes hours! This 
is a long e-mail, but I give the rules for my converter in both logical
form and a FOL XML language based on Discourse Representation Theory (a 
notational variant of FOL, Kamp and Reyle) into OWL.

 	The logical facts I have are standard in FOL, and I'm 
trying to follow standard "direct translation" conventions as I've read 
in Tsarkov and Horrocks DL  2003 paper as well as other DL tutorials and 
papers. What I'd like to do is to find what subset of my FOL database I can translate into OWL-DL. I'll try to convert 
the whole database over to OWL-DL, then reconvert it back to my FOL. Then 
I'll use my FOL theorem-prover to prove (using bliksem or vampire), fact 
by fact independently, the FOL fact  equivalent with the  OWL-DL->FOL fact.

    Since OWL-DL is less  expressive than OWL, there will be  lots of 
statements in FOL not  convertible to OWL, and so they will fail in the 
theorem-prover when  re-converted. I am trying to figure out exactly 
what these facts are! These facts will just be excluded from  my OWL 
knowledgebase automatically when the theorem-prover fails. Does  this 
sound sensible? If not, why? I'm a linguist by  trade, not a logician, so 
I may get these things wrong.

 	Here's my rules. I use "==>>" for the translation function. I
  	first try to write it in logical notation then again in a
         conversion from XML->OWL. The XML format is pretty simple,
 	and basically facts are given as <dr> and quantified groups of
 	facts as <drs>, with quantification assumed to be existential.
 	Thus, an expression of many facts is given by a bunch of <dr>
 	inside a <drs>. The <dr> represent a variable which has been
 	bound, and predicates about a variable are given the <pred> tag,
 	with binary relationships given a <rel> tag.

   	1) Unary Relationships ==>> OWL Classes (DL concepts)

            p(x) ==>> A

 	   (Option 1)
 	   <pred arg="_G9011">diver</pred> ==>>
            <owl:Class rdf:ID="diver"/>

  	   Seems simple? However, it's not that simple. There was just
 	a lost of information, that there was an actual variable ("x" or
 	"_G9011", an automatically generated ID) that was lost. So,
 	shouldn't we for each unary predicate have an individual and a
        class?

 	(Option 2)
          <owl:Class rdf:ID="diver"/>
          <diver rdf:ID="_G9011"/>

 	2) Binary Relationships ==>> OWL ObjectProperties (DL 
rolenames / <rel>)

            q(x,y) ==>> R

            <rel arg1="_G4033" arg2="_G4548">of</rel> ==>>

           (Option 1)
            <owl:ObjectProperty rdf:ID="of">
                  <rdfs:domain rdf:resource="owl:Thing"/>
                  <rdfs:range rdf:resource="owl:Thing"/>
             </owl:ObjectProperty>

 	Note: I obviously think there is a domain and range of "of" that
 	should restrict it beyond owl:Thing. For example, we could
 	make it, if "_G4033" were an instance of Class Son and "_G4548"
         were an instance of Class "Father", then we could model it:

          (Option 2)
 	 <owl:ObjectProperty rdf:ID="of">
                  <rdfs:domain rdf:resource="omcs:Son"/>
                  <rdfs:range rdf:resource="omcs:Father"/>
             </owl:ObjectProperty>

 	But the problem is that the individuals could be members of many
 	differing classes, and we wouldn't know that just from their
 	instance IDs (i.e. "_G4548"), and since this thing is basically
         monotonic, how do we know what else might be in the domain and
         range of "of". The simplest thing seems to be leave the domain and
         range as "owl:Thing". Then we could also maybe solve the problem
         by instanting an instance - see later discussion about
         quantification in 7), but we could just do as we did previously
         and resolve it by incarnating an individual:

 	(Option 3)
            <owl:ObjectProperty rdf:ID="of">
                  <rdfs:domain rdf:resource="owl:Thing"/>
                  <rdfs:range rdf:resource="owl:Thing"/>
             </owl:ObjectProperty>

 	   <owl:Thing rdf:ID="_G4033"><of rdf:ID="_G4548" /></owl:Thing>

 	I think (Option 3) is best.

 	3) Negation ==>> DL Negation (Really not sure about this!/<not>)

 	    NOT(p(x)) ==> NOT A

           <drs>
            <dr>_G11545</dr>
            <pred arg="_G11545">diver</pred>
            <not>
             <drs>
             <dr>_G11546</dr>
             <pred arg="_G11546">elephant</pred>
             </drs>
           </not>
           </drs>

             ==>>

               <owl:Class rdf:ID="diver">
                  <owl:complementOf rdf:ID="elephant" />
               </owl:Class>

 	     <diver rdf:ID="_G11545" />
 	     <elephant rdf:ID="_G11546" />

 	Not really sure if complementOf gives us what we want here...

 	4) Implication ==>> OWL subClassOf   (DL Subsumption / <imp>)

 	P(x)->Q(y) ==> P subsumes Q

         <drs>
         <pred arg="_G1682">swimmer</pred>
         <dr>_G125324</dr>
         <imp>
           <drs>
           <dr>_G1682</dr>
           <pred arg="_G1682">diver</pred>
           </drs>
         </imp>
         </drs>

         ==>

 	<owl:Class rdf:ID="diver">
           <rdfs:subClassOf rdf:ID="swimmer" />
         </owl:Class>

         <diver rdf:ID="_G1682" />
         <swimmer rdf:ID="_G125324" />

 	Now, if there's more than one predicate in the scope of the
         antecedent, we just iterate. If there's more than one predicate
         in the scope of consequent, we iterate again. Same for negation,
 	but still see discussion 7) because I think I may be wrong here.
 	Perhaps some type of rdf:collection?

         5) And ==>> owl:intersectionOf (DL Intersection/implicit in XML)

 	p(x) and q(x) ==> P INTERSECTION Q

 	Now, this is a big question. In our database, most everything
 	by default uses "and". For example, right now we aren't explicitly
 	intersecting things, because we are not keeping the <drs> or
 	quantified scope as a OWL class itself. Yet, should we, and then
 	have as its definition the intersection of all its variables?
 	That seems one way to do it, but I worry that would be too
         complex. If we were going to lose that information, could we
 	just keep all "ands" implicit by keeping them in our same
         database?

 	6) Or ==>>  owl:UnionOf     (DL Union/explicit/<or> )

 	p(x) or q(y) ==>> P UNION Q

 	I can't find and example of OR in my knowledgebase, and same
 	question as for "and" are there.

 	7) Universal Quantification and Existential Quantification

 	Basically, in our database existential quantification is
 	extremely common while universal quantification is rare.
 	This is basically due to a mistake in our automatic processing
         of text - generics (dogs are mammals) are always processed
 	as existential (there exists a dog that is a mammal) instead
 	of as a universal (every dog is a mammal). Go figure, we should
 	try to correct that but it's easier said than done :) Again,
 	one idea would be for universal quantification to not instantiate
 	any individuals, while for existential quantificaiton instantiate
 	one individual. Hmmm....but existential means "at least one", not
 	"just one". Hmmm...does this mean we have to throw out all
         universally quantified statements? Or should we be going this the
 	other way around, making everything universal and throwing out
         the existential? Yet as argued earlier, without at least one
         individual for each class and relation, we lose the ability to
 	convert back to FOL.

 	8) Propositions and Reification?

 	Now, often in language people say things like "John believes Bob
    ate the sandwich". Now, it's pretty clear an entity named "John", has
    as a "belief" a proposition "Bob ate the sandwich". Our knowledge-base does
    this as well, and I think it could be modelled as reification. Is this is
    a good idea or correct? This is a somewhat simplifed (i.e.  artificial)
    example, but it illustrates the point!:

         <drs>
          <pred arg="_G31835">John</pred>
          <rel arg="_G31835" arg="_G36964">believe</pred>
 	 <prop argument="_G36964">
            <drs>
            <pred arg="_G31837">Bob</pred>
            <rel arg="_G31837" arg="_G31838">ate</pred>
            <pred arg="_G318378">sandwich</rel>
            </drs>
           </prop>
         </drs>

 	==>>

 	<owl:Class rdf:ID="John">
         <owl:ObjectProperty rdf:ID="believes">
                  <rdfs:domain rdf:resource="owl:Thing"/>
                  <rdfs:range rdf:resource="owl:Thing"/>
             </owl:ObjectProperty>

         <owl:Class rdf:ID="prop" />
 	<owl:Class rdf:ID="Bob">
 	<owl:Class rdf:ID="sandwich">
         <owl:ObjectProperty rdf:ID="eat">
                  <rdfs:domain rdf:resource="owl:Thing"/>
                  <rdfs:range rdf:resource="owl:Thing"/>
             </owl:ObjectProperty>

           <John rdf:ID="_G31835" />
           <owl:Thing rdf:ID="_G31835"><believe rdf:ID="_G36964"/></owl:Thing>
           <rdf:Description rdf:type="prop" rdf:ID="_G36964">
             <rdf:subject rdf:type="Bob" rdf:ID="_G31837" />
             <rdf:predicate  rdf:type="eat" rdf:ID="_G31837" />
             <rdf:object rdf:type="sandwich" rdf:ID="_G318378" />
           </rdf:Desription>

 	9) Dealing with a neo-davidsonian event framework in DL.

 	Now, the above example (which probably got reificaiton wrong) is
simplified. In a neo-Davidsonian model of FOL like the type we are trying 
to use, we basically don't have relationships like "eat" directly take
a subject and object. Instead, they instantiate an "event", and the 
arguments are made into "agent" (1st argument) and "patient" (2nd 
argument), "theme" (3rd argument). This type of framework is often used
by comptuational linguists in FOL to help represent language while
keeping all relationships unary and binary.

 	The question is should we model "Bob ate the sandwich" directly
as a triple, or have an interleaving structure of events, agent, and 
patient classes.

 	The first case is easier to read:

 	<owl:Class rdf:ID="Bob">
         <owl:Class rdf:ID="sandwich">
         <owl:ObjectProperty rdf:ID="eat">
                  <rdfs:domain rdf:resource="owl:Thing"/>
                  <rdfs:range rdf:resource="owl:Thing"/>
             </owl:ObjectProperty>

         <Bob rdf:ID="_G31835" />
         <sandwich rdf:ID="_G31836" />
         <owl:Thing rdf:ID="_G31835"><eat rdf:ID="_G31836"/></owl:Thing>

 	The second case effectively bundles everything up into events and
keeps the relationships down the predefined set of "agent", "patient", 
"event", and the adjectives (like "of" or "with").

 	So instead of Bob(x), sandwich(y), eat(x,y)

 	we get:

 	Bob(x), sandwich(y), eat(z), agent(x,z), patient(z,y), event(z)

 	Which translated into OWL means:

         <owl:Class rdf:ID="Bob">
         <owl:Class rdf:ID="sandwich">
         <owl:Class rdf:ID="event">
         <owl:ObjectProperty rdf:ID="agent">
                  <rdfs:domain rdf:resource="owl:Thing"/>
                  <rdfs:range rdf:resource="owl:Thing"/>
             </owl:ObjectProperty>
          <owl:ObjectProperty rdf:ID="patient">
                  <rdfs:domain rdf:resource="owl:Thing"/>
                  <rdfs:range rdf:resource="owl:Thing"/>
             </owl:ObjectProperty>

         <Bob rdf:ID="_G31835" />
         <sandwich rdf:ID="_G31836" />
         <eat rdf:ID="_G31837"/>
         <event rdf:ID="_G31837"/>

         <owl:Thing rdf:ID="_G31835"><agent rdf:ID="_G31837"/></owl:Thing>
         <owl:Thing rdf:ID="_G31836"><patient rdf:ID="_G31837"/></owl:Thing>

 	This appears to be overkill with Bob ate the sandwich. But what 
about "Bob ate the sandwich with his fork". All of a  sudden, things are 
much more complex! Yet, you can make it Bob(x), sandwich(y), eat(z), 
with(z,u), fork(u), event(z), agent(x,z), patient(z,y). The "with fork" 
also participates in the abstract "event".

 	So, should we ditch the neo-davidsonian framework in OWL, or 
preserve it? Perhaps produced a simplified version, one with 
neo-davidsonian and another XSLT script to get to a "pure triple" form.
However, I have a funny feeling that except for the simplest cases the
"pure triple" form would just be a logically unable to convert back over 
to FOL, but the neo-davidsonian would. However, the lost of all scoping 
information by not giving which triple was within each <drs> collection 
any scope might cause the entire backconversion back from OWL DL -> FOL to backfire.

10) Practicalities:

 	If we're going to have over a million facts, we're going to need
a large OWL/RDF database storage program and a reasoner. Which reasoner 
and database do you recommend? Lastly, what's the fastest OWL validator?
I want to at least make sure my syntax is correct (which, since I typed
by hand a few of these examples, there may be some mistakes.)

 	Thaks again for any help you can provide! I know it's a long
e-mail, but someone had to do it for the sake of all us FOL users out 
there who want to get their databases in SemWeb format. After all, a 
million "common-sense" facts could be useful to someone in SemWeb world, I 
hope!



 			Cheers,
-- 
 				--harry

 	Harry Halpin
 	Informatics, University of Edinburgh
         http://www.ibiblio.org/hhalpin
Received on Monday, 31 January 2005 04:09:32 UTC