- From: Harry Halpin <hhalpin@ibiblio.org>
- Date: Sun, 30 Jan 2005 23:09:30 -0500 (EST)
- To: www-rdf-interest@w3.org
Everyone,
I am working on a converter for some logical facts from my
own internal XML format to OWL. I want to get this right as this is about
a million facts, so running the XSL and theorem-prover takes hours! This
is a long e-mail, but I give the rules for my converter in both logical
form and a FOL XML language based on Discourse Representation Theory (a
notational variant of FOL, Kamp and Reyle) into OWL.
The logical facts I have are standard in FOL, and I'm
trying to follow standard "direct translation" conventions as I've read
in Tsarkov and Horrocks DL 2003 paper as well as other DL tutorials and
papers. What I'd like to do is to find what subset of my FOL database I can translate into OWL-DL. I'll try to convert
the whole database over to OWL-DL, then reconvert it back to my FOL. Then
I'll use my FOL theorem-prover to prove (using bliksem or vampire), fact
by fact independently, the FOL fact equivalent with the OWL-DL->FOL fact.
Since OWL-DL is less expressive than OWL, there will be lots of
statements in FOL not convertible to OWL, and so they will fail in the
theorem-prover when re-converted. I am trying to figure out exactly
what these facts are! These facts will just be excluded from my OWL
knowledgebase automatically when the theorem-prover fails. Does this
sound sensible? If not, why? I'm a linguist by trade, not a logician, so
I may get these things wrong.
Here's my rules. I use "==>>" for the translation function. I
first try to write it in logical notation then again in a
conversion from XML->OWL. The XML format is pretty simple,
and basically facts are given as <dr> and quantified groups of
facts as <drs>, with quantification assumed to be existential.
Thus, an expression of many facts is given by a bunch of <dr>
inside a <drs>. The <dr> represent a variable which has been
bound, and predicates about a variable are given the <pred> tag,
with binary relationships given a <rel> tag.
1) Unary Relationships ==>> OWL Classes (DL concepts)
p(x) ==>> A
(Option 1)
<pred arg="_G9011">diver</pred> ==>>
<owl:Class rdf:ID="diver"/>
Seems simple? However, it's not that simple. There was just
a lost of information, that there was an actual variable ("x" or
"_G9011", an automatically generated ID) that was lost. So,
shouldn't we for each unary predicate have an individual and a
class?
(Option 2)
<owl:Class rdf:ID="diver"/>
<diver rdf:ID="_G9011"/>
2) Binary Relationships ==>> OWL ObjectProperties (DL
rolenames / <rel>)
q(x,y) ==>> R
<rel arg1="_G4033" arg2="_G4548">of</rel> ==>>
(Option 1)
<owl:ObjectProperty rdf:ID="of">
<rdfs:domain rdf:resource="owl:Thing"/>
<rdfs:range rdf:resource="owl:Thing"/>
</owl:ObjectProperty>
Note: I obviously think there is a domain and range of "of" that
should restrict it beyond owl:Thing. For example, we could
make it, if "_G4033" were an instance of Class Son and "_G4548"
were an instance of Class "Father", then we could model it:
(Option 2)
<owl:ObjectProperty rdf:ID="of">
<rdfs:domain rdf:resource="omcs:Son"/>
<rdfs:range rdf:resource="omcs:Father"/>
</owl:ObjectProperty>
But the problem is that the individuals could be members of many
differing classes, and we wouldn't know that just from their
instance IDs (i.e. "_G4548"), and since this thing is basically
monotonic, how do we know what else might be in the domain and
range of "of". The simplest thing seems to be leave the domain and
range as "owl:Thing". Then we could also maybe solve the problem
by instanting an instance - see later discussion about
quantification in 7), but we could just do as we did previously
and resolve it by incarnating an individual:
(Option 3)
<owl:ObjectProperty rdf:ID="of">
<rdfs:domain rdf:resource="owl:Thing"/>
<rdfs:range rdf:resource="owl:Thing"/>
</owl:ObjectProperty>
<owl:Thing rdf:ID="_G4033"><of rdf:ID="_G4548" /></owl:Thing>
I think (Option 3) is best.
3) Negation ==>> DL Negation (Really not sure about this!/<not>)
NOT(p(x)) ==> NOT A
<drs>
<dr>_G11545</dr>
<pred arg="_G11545">diver</pred>
<not>
<drs>
<dr>_G11546</dr>
<pred arg="_G11546">elephant</pred>
</drs>
</not>
</drs>
==>>
<owl:Class rdf:ID="diver">
<owl:complementOf rdf:ID="elephant" />
</owl:Class>
<diver rdf:ID="_G11545" />
<elephant rdf:ID="_G11546" />
Not really sure if complementOf gives us what we want here...
4) Implication ==>> OWL subClassOf (DL Subsumption / <imp>)
P(x)->Q(y) ==> P subsumes Q
<drs>
<pred arg="_G1682">swimmer</pred>
<dr>_G125324</dr>
<imp>
<drs>
<dr>_G1682</dr>
<pred arg="_G1682">diver</pred>
</drs>
</imp>
</drs>
==>
<owl:Class rdf:ID="diver">
<rdfs:subClassOf rdf:ID="swimmer" />
</owl:Class>
<diver rdf:ID="_G1682" />
<swimmer rdf:ID="_G125324" />
Now, if there's more than one predicate in the scope of the
antecedent, we just iterate. If there's more than one predicate
in the scope of consequent, we iterate again. Same for negation,
but still see discussion 7) because I think I may be wrong here.
Perhaps some type of rdf:collection?
5) And ==>> owl:intersectionOf (DL Intersection/implicit in XML)
p(x) and q(x) ==> P INTERSECTION Q
Now, this is a big question. In our database, most everything
by default uses "and". For example, right now we aren't explicitly
intersecting things, because we are not keeping the <drs> or
quantified scope as a OWL class itself. Yet, should we, and then
have as its definition the intersection of all its variables?
That seems one way to do it, but I worry that would be too
complex. If we were going to lose that information, could we
just keep all "ands" implicit by keeping them in our same
database?
6) Or ==>> owl:UnionOf (DL Union/explicit/<or> )
p(x) or q(y) ==>> P UNION Q
I can't find and example of OR in my knowledgebase, and same
question as for "and" are there.
7) Universal Quantification and Existential Quantification
Basically, in our database existential quantification is
extremely common while universal quantification is rare.
This is basically due to a mistake in our automatic processing
of text - generics (dogs are mammals) are always processed
as existential (there exists a dog that is a mammal) instead
of as a universal (every dog is a mammal). Go figure, we should
try to correct that but it's easier said than done :) Again,
one idea would be for universal quantification to not instantiate
any individuals, while for existential quantificaiton instantiate
one individual. Hmmm....but existential means "at least one", not
"just one". Hmmm...does this mean we have to throw out all
universally quantified statements? Or should we be going this the
other way around, making everything universal and throwing out
the existential? Yet as argued earlier, without at least one
individual for each class and relation, we lose the ability to
convert back to FOL.
8) Propositions and Reification?
Now, often in language people say things like "John believes Bob
ate the sandwich". Now, it's pretty clear an entity named "John", has
as a "belief" a proposition "Bob ate the sandwich". Our knowledge-base does
this as well, and I think it could be modelled as reification. Is this is
a good idea or correct? This is a somewhat simplifed (i.e. artificial)
example, but it illustrates the point!:
<drs>
<pred arg="_G31835">John</pred>
<rel arg="_G31835" arg="_G36964">believe</pred>
<prop argument="_G36964">
<drs>
<pred arg="_G31837">Bob</pred>
<rel arg="_G31837" arg="_G31838">ate</pred>
<pred arg="_G318378">sandwich</rel>
</drs>
</prop>
</drs>
==>>
<owl:Class rdf:ID="John">
<owl:ObjectProperty rdf:ID="believes">
<rdfs:domain rdf:resource="owl:Thing"/>
<rdfs:range rdf:resource="owl:Thing"/>
</owl:ObjectProperty>
<owl:Class rdf:ID="prop" />
<owl:Class rdf:ID="Bob">
<owl:Class rdf:ID="sandwich">
<owl:ObjectProperty rdf:ID="eat">
<rdfs:domain rdf:resource="owl:Thing"/>
<rdfs:range rdf:resource="owl:Thing"/>
</owl:ObjectProperty>
<John rdf:ID="_G31835" />
<owl:Thing rdf:ID="_G31835"><believe rdf:ID="_G36964"/></owl:Thing>
<rdf:Description rdf:type="prop" rdf:ID="_G36964">
<rdf:subject rdf:type="Bob" rdf:ID="_G31837" />
<rdf:predicate rdf:type="eat" rdf:ID="_G31837" />
<rdf:object rdf:type="sandwich" rdf:ID="_G318378" />
</rdf:Desription>
9) Dealing with a neo-davidsonian event framework in DL.
Now, the above example (which probably got reificaiton wrong) is
simplified. In a neo-Davidsonian model of FOL like the type we are trying
to use, we basically don't have relationships like "eat" directly take
a subject and object. Instead, they instantiate an "event", and the
arguments are made into "agent" (1st argument) and "patient" (2nd
argument), "theme" (3rd argument). This type of framework is often used
by comptuational linguists in FOL to help represent language while
keeping all relationships unary and binary.
The question is should we model "Bob ate the sandwich" directly
as a triple, or have an interleaving structure of events, agent, and
patient classes.
The first case is easier to read:
<owl:Class rdf:ID="Bob">
<owl:Class rdf:ID="sandwich">
<owl:ObjectProperty rdf:ID="eat">
<rdfs:domain rdf:resource="owl:Thing"/>
<rdfs:range rdf:resource="owl:Thing"/>
</owl:ObjectProperty>
<Bob rdf:ID="_G31835" />
<sandwich rdf:ID="_G31836" />
<owl:Thing rdf:ID="_G31835"><eat rdf:ID="_G31836"/></owl:Thing>
The second case effectively bundles everything up into events and
keeps the relationships down the predefined set of "agent", "patient",
"event", and the adjectives (like "of" or "with").
So instead of Bob(x), sandwich(y), eat(x,y)
we get:
Bob(x), sandwich(y), eat(z), agent(x,z), patient(z,y), event(z)
Which translated into OWL means:
<owl:Class rdf:ID="Bob">
<owl:Class rdf:ID="sandwich">
<owl:Class rdf:ID="event">
<owl:ObjectProperty rdf:ID="agent">
<rdfs:domain rdf:resource="owl:Thing"/>
<rdfs:range rdf:resource="owl:Thing"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="patient">
<rdfs:domain rdf:resource="owl:Thing"/>
<rdfs:range rdf:resource="owl:Thing"/>
</owl:ObjectProperty>
<Bob rdf:ID="_G31835" />
<sandwich rdf:ID="_G31836" />
<eat rdf:ID="_G31837"/>
<event rdf:ID="_G31837"/>
<owl:Thing rdf:ID="_G31835"><agent rdf:ID="_G31837"/></owl:Thing>
<owl:Thing rdf:ID="_G31836"><patient rdf:ID="_G31837"/></owl:Thing>
This appears to be overkill with Bob ate the sandwich. But what
about "Bob ate the sandwich with his fork". All of a sudden, things are
much more complex! Yet, you can make it Bob(x), sandwich(y), eat(z),
with(z,u), fork(u), event(z), agent(x,z), patient(z,y). The "with fork"
also participates in the abstract "event".
So, should we ditch the neo-davidsonian framework in OWL, or
preserve it? Perhaps produced a simplified version, one with
neo-davidsonian and another XSLT script to get to a "pure triple" form.
However, I have a funny feeling that except for the simplest cases the
"pure triple" form would just be a logically unable to convert back over
to FOL, but the neo-davidsonian would. However, the lost of all scoping
information by not giving which triple was within each <drs> collection
any scope might cause the entire backconversion back from OWL DL -> FOL to backfire.
10) Practicalities:
If we're going to have over a million facts, we're going to need
a large OWL/RDF database storage program and a reasoner. Which reasoner
and database do you recommend? Lastly, what's the fastest OWL validator?
I want to at least make sure my syntax is correct (which, since I typed
by hand a few of these examples, there may be some mistakes.)
Thaks again for any help you can provide! I know it's a long
e-mail, but someone had to do it for the sake of all us FOL users out
there who want to get their databases in SemWeb format. After all, a
million "common-sense" facts could be useful to someone in SemWeb world, I
hope!
Cheers,
--
--harry
Harry Halpin
Informatics, University of Edinburgh
http://www.ibiblio.org/hhalpin
Received on Monday, 31 January 2005 04:09:32 UTC