- From: Harry Halpin <hhalpin@ibiblio.org>
- Date: Sun, 30 Jan 2005 23:09:30 -0500 (EST)
- To: www-rdf-interest@w3.org
Everyone, I am working on a converter for some logical facts from my own internal XML format to OWL. I want to get this right as this is about a million facts, so running the XSL and theorem-prover takes hours! This is a long e-mail, but I give the rules for my converter in both logical form and a FOL XML language based on Discourse Representation Theory (a notational variant of FOL, Kamp and Reyle) into OWL. The logical facts I have are standard in FOL, and I'm trying to follow standard "direct translation" conventions as I've read in Tsarkov and Horrocks DL 2003 paper as well as other DL tutorials and papers. What I'd like to do is to find what subset of my FOL database I can translate into OWL-DL. I'll try to convert the whole database over to OWL-DL, then reconvert it back to my FOL. Then I'll use my FOL theorem-prover to prove (using bliksem or vampire), fact by fact independently, the FOL fact equivalent with the OWL-DL->FOL fact. Since OWL-DL is less expressive than OWL, there will be lots of statements in FOL not convertible to OWL, and so they will fail in the theorem-prover when re-converted. I am trying to figure out exactly what these facts are! These facts will just be excluded from my OWL knowledgebase automatically when the theorem-prover fails. Does this sound sensible? If not, why? I'm a linguist by trade, not a logician, so I may get these things wrong. Here's my rules. I use "==>>" for the translation function. I first try to write it in logical notation then again in a conversion from XML->OWL. The XML format is pretty simple, and basically facts are given as <dr> and quantified groups of facts as <drs>, with quantification assumed to be existential. Thus, an expression of many facts is given by a bunch of <dr> inside a <drs>. The <dr> represent a variable which has been bound, and predicates about a variable are given the <pred> tag, with binary relationships given a <rel> tag. 1) Unary Relationships ==>> OWL Classes (DL concepts) p(x) ==>> A (Option 1) <pred arg="_G9011">diver</pred> ==>> <owl:Class rdf:ID="diver"/> Seems simple? However, it's not that simple. There was just a lost of information, that there was an actual variable ("x" or "_G9011", an automatically generated ID) that was lost. So, shouldn't we for each unary predicate have an individual and a class? (Option 2) <owl:Class rdf:ID="diver"/> <diver rdf:ID="_G9011"/> 2) Binary Relationships ==>> OWL ObjectProperties (DL rolenames / <rel>) q(x,y) ==>> R <rel arg1="_G4033" arg2="_G4548">of</rel> ==>> (Option 1) <owl:ObjectProperty rdf:ID="of"> <rdfs:domain rdf:resource="owl:Thing"/> <rdfs:range rdf:resource="owl:Thing"/> </owl:ObjectProperty> Note: I obviously think there is a domain and range of "of" that should restrict it beyond owl:Thing. For example, we could make it, if "_G4033" were an instance of Class Son and "_G4548" were an instance of Class "Father", then we could model it: (Option 2) <owl:ObjectProperty rdf:ID="of"> <rdfs:domain rdf:resource="omcs:Son"/> <rdfs:range rdf:resource="omcs:Father"/> </owl:ObjectProperty> But the problem is that the individuals could be members of many differing classes, and we wouldn't know that just from their instance IDs (i.e. "_G4548"), and since this thing is basically monotonic, how do we know what else might be in the domain and range of "of". The simplest thing seems to be leave the domain and range as "owl:Thing". Then we could also maybe solve the problem by instanting an instance - see later discussion about quantification in 7), but we could just do as we did previously and resolve it by incarnating an individual: (Option 3) <owl:ObjectProperty rdf:ID="of"> <rdfs:domain rdf:resource="owl:Thing"/> <rdfs:range rdf:resource="owl:Thing"/> </owl:ObjectProperty> <owl:Thing rdf:ID="_G4033"><of rdf:ID="_G4548" /></owl:Thing> I think (Option 3) is best. 3) Negation ==>> DL Negation (Really not sure about this!/<not>) NOT(p(x)) ==> NOT A <drs> <dr>_G11545</dr> <pred arg="_G11545">diver</pred> <not> <drs> <dr>_G11546</dr> <pred arg="_G11546">elephant</pred> </drs> </not> </drs> ==>> <owl:Class rdf:ID="diver"> <owl:complementOf rdf:ID="elephant" /> </owl:Class> <diver rdf:ID="_G11545" /> <elephant rdf:ID="_G11546" /> Not really sure if complementOf gives us what we want here... 4) Implication ==>> OWL subClassOf (DL Subsumption / <imp>) P(x)->Q(y) ==> P subsumes Q <drs> <pred arg="_G1682">swimmer</pred> <dr>_G125324</dr> <imp> <drs> <dr>_G1682</dr> <pred arg="_G1682">diver</pred> </drs> </imp> </drs> ==> <owl:Class rdf:ID="diver"> <rdfs:subClassOf rdf:ID="swimmer" /> </owl:Class> <diver rdf:ID="_G1682" /> <swimmer rdf:ID="_G125324" /> Now, if there's more than one predicate in the scope of the antecedent, we just iterate. If there's more than one predicate in the scope of consequent, we iterate again. Same for negation, but still see discussion 7) because I think I may be wrong here. Perhaps some type of rdf:collection? 5) And ==>> owl:intersectionOf (DL Intersection/implicit in XML) p(x) and q(x) ==> P INTERSECTION Q Now, this is a big question. In our database, most everything by default uses "and". For example, right now we aren't explicitly intersecting things, because we are not keeping the <drs> or quantified scope as a OWL class itself. Yet, should we, and then have as its definition the intersection of all its variables? That seems one way to do it, but I worry that would be too complex. If we were going to lose that information, could we just keep all "ands" implicit by keeping them in our same database? 6) Or ==>> owl:UnionOf (DL Union/explicit/<or> ) p(x) or q(y) ==>> P UNION Q I can't find and example of OR in my knowledgebase, and same question as for "and" are there. 7) Universal Quantification and Existential Quantification Basically, in our database existential quantification is extremely common while universal quantification is rare. This is basically due to a mistake in our automatic processing of text - generics (dogs are mammals) are always processed as existential (there exists a dog that is a mammal) instead of as a universal (every dog is a mammal). Go figure, we should try to correct that but it's easier said than done :) Again, one idea would be for universal quantification to not instantiate any individuals, while for existential quantificaiton instantiate one individual. Hmmm....but existential means "at least one", not "just one". Hmmm...does this mean we have to throw out all universally quantified statements? Or should we be going this the other way around, making everything universal and throwing out the existential? Yet as argued earlier, without at least one individual for each class and relation, we lose the ability to convert back to FOL. 8) Propositions and Reification? Now, often in language people say things like "John believes Bob ate the sandwich". Now, it's pretty clear an entity named "John", has as a "belief" a proposition "Bob ate the sandwich". Our knowledge-base does this as well, and I think it could be modelled as reification. Is this is a good idea or correct? This is a somewhat simplifed (i.e. artificial) example, but it illustrates the point!: <drs> <pred arg="_G31835">John</pred> <rel arg="_G31835" arg="_G36964">believe</pred> <prop argument="_G36964"> <drs> <pred arg="_G31837">Bob</pred> <rel arg="_G31837" arg="_G31838">ate</pred> <pred arg="_G318378">sandwich</rel> </drs> </prop> </drs> ==>> <owl:Class rdf:ID="John"> <owl:ObjectProperty rdf:ID="believes"> <rdfs:domain rdf:resource="owl:Thing"/> <rdfs:range rdf:resource="owl:Thing"/> </owl:ObjectProperty> <owl:Class rdf:ID="prop" /> <owl:Class rdf:ID="Bob"> <owl:Class rdf:ID="sandwich"> <owl:ObjectProperty rdf:ID="eat"> <rdfs:domain rdf:resource="owl:Thing"/> <rdfs:range rdf:resource="owl:Thing"/> </owl:ObjectProperty> <John rdf:ID="_G31835" /> <owl:Thing rdf:ID="_G31835"><believe rdf:ID="_G36964"/></owl:Thing> <rdf:Description rdf:type="prop" rdf:ID="_G36964"> <rdf:subject rdf:type="Bob" rdf:ID="_G31837" /> <rdf:predicate rdf:type="eat" rdf:ID="_G31837" /> <rdf:object rdf:type="sandwich" rdf:ID="_G318378" /> </rdf:Desription> 9) Dealing with a neo-davidsonian event framework in DL. Now, the above example (which probably got reificaiton wrong) is simplified. In a neo-Davidsonian model of FOL like the type we are trying to use, we basically don't have relationships like "eat" directly take a subject and object. Instead, they instantiate an "event", and the arguments are made into "agent" (1st argument) and "patient" (2nd argument), "theme" (3rd argument). This type of framework is often used by comptuational linguists in FOL to help represent language while keeping all relationships unary and binary. The question is should we model "Bob ate the sandwich" directly as a triple, or have an interleaving structure of events, agent, and patient classes. The first case is easier to read: <owl:Class rdf:ID="Bob"> <owl:Class rdf:ID="sandwich"> <owl:ObjectProperty rdf:ID="eat"> <rdfs:domain rdf:resource="owl:Thing"/> <rdfs:range rdf:resource="owl:Thing"/> </owl:ObjectProperty> <Bob rdf:ID="_G31835" /> <sandwich rdf:ID="_G31836" /> <owl:Thing rdf:ID="_G31835"><eat rdf:ID="_G31836"/></owl:Thing> The second case effectively bundles everything up into events and keeps the relationships down the predefined set of "agent", "patient", "event", and the adjectives (like "of" or "with"). So instead of Bob(x), sandwich(y), eat(x,y) we get: Bob(x), sandwich(y), eat(z), agent(x,z), patient(z,y), event(z) Which translated into OWL means: <owl:Class rdf:ID="Bob"> <owl:Class rdf:ID="sandwich"> <owl:Class rdf:ID="event"> <owl:ObjectProperty rdf:ID="agent"> <rdfs:domain rdf:resource="owl:Thing"/> <rdfs:range rdf:resource="owl:Thing"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="patient"> <rdfs:domain rdf:resource="owl:Thing"/> <rdfs:range rdf:resource="owl:Thing"/> </owl:ObjectProperty> <Bob rdf:ID="_G31835" /> <sandwich rdf:ID="_G31836" /> <eat rdf:ID="_G31837"/> <event rdf:ID="_G31837"/> <owl:Thing rdf:ID="_G31835"><agent rdf:ID="_G31837"/></owl:Thing> <owl:Thing rdf:ID="_G31836"><patient rdf:ID="_G31837"/></owl:Thing> This appears to be overkill with Bob ate the sandwich. But what about "Bob ate the sandwich with his fork". All of a sudden, things are much more complex! Yet, you can make it Bob(x), sandwich(y), eat(z), with(z,u), fork(u), event(z), agent(x,z), patient(z,y). The "with fork" also participates in the abstract "event". So, should we ditch the neo-davidsonian framework in OWL, or preserve it? Perhaps produced a simplified version, one with neo-davidsonian and another XSLT script to get to a "pure triple" form. However, I have a funny feeling that except for the simplest cases the "pure triple" form would just be a logically unable to convert back over to FOL, but the neo-davidsonian would. However, the lost of all scoping information by not giving which triple was within each <drs> collection any scope might cause the entire backconversion back from OWL DL -> FOL to backfire. 10) Practicalities: If we're going to have over a million facts, we're going to need a large OWL/RDF database storage program and a reasoner. Which reasoner and database do you recommend? Lastly, what's the fastest OWL validator? I want to at least make sure my syntax is correct (which, since I typed by hand a few of these examples, there may be some mistakes.) Thaks again for any help you can provide! I know it's a long e-mail, but someone had to do it for the sake of all us FOL users out there who want to get their databases in SemWeb format. After all, a million "common-sense" facts could be useful to someone in SemWeb world, I hope! Cheers, -- --harry Harry Halpin Informatics, University of Edinburgh http://www.ibiblio.org/hhalpin
Received on Monday, 31 January 2005 04:09:32 UTC