- From: Chris Mungall <cjm@fruitfly.org>
- Date: Wed, 19 Dec 2007 21:56:09 -0800
- To: "Christopher Rose" <chrisrose.chrisrose@gmail.com>
- Cc: public-owl-dev@w3.org, "Peter E. Midford" <peteremidford@yahoo.com>, Jim Balhoff <balhoff@nescent.org>, sarkar@mbl.edu
On Dec 19, 2007, at 12:20 PM, Christopher Rose wrote: > > Hello All, > > Please forgive what may be a neophyte question, I hope I've found a > forum where this question is relevant. It's particularly a question > about formulating ontologies in Owl and RDF. > > I'm modelling an actual scientific taxonomy of living creatures. I > want to model elements at each level of the taxonomy (a.k.a. taxa) as > Owl classes, such that restrictions on properties I place on taxa at > higher levels of the hierarchy are passed down to lower levels. For > instance I might have a restriction that every member of class Aves > (which is a taxonomic class as well as on Owl Class) has wings, and I > would expect then that a Class Strigiformes (strigiformes is the > taxonomic order containing actual owls, the kind with wings) which I > might later define to be subClassOf rdf:resource="#Aves" would have > the same restriction. This seems natural, to follow the intention of > the language, and to model the expectations that human taxonomers (or > 'systematists') might have. This is a fairly standard way of representing alpha taxonomies, each taxon is a class, connected by subclass axioms. Individuals would be particular organisms, e.g. orville the owl (typically not named in the ontology of course) You only allude to the more difficult and interesting part: how do you represent Wing? > But there is a lot of information regarding the class Aves which does > not represent restrictions on the individuals (or subclasses) who may > be members. Much of that information is related to the class itself - > a reference to the relevant papers defining the class (Linnaeus, > 1758), common names associated with it, a serial number for the > taxonomic unit bestowed by various scientific organizations, their > level of acceptance of that taxon, etc. > > But if I understand Owl syntax correctly, I cannot simply use it to > say > > <owl:Class rdf:ID="Aves"> > <rdfs:hasaSerialNumber rdf:about="174371"> > <subclassOf rdf:resource="#Vertebrata"> > </owl:Class> This is fine if you declare an AnnotationProperty (not in the rdfs namespace) called hasASerialNumber (or better: has_serial_number) The serial# applies to the resource Aves, not to the class extent of Aves. This works, with the proviso that AnnotationProperties are lacking in semantics. There is nothing to stop you giving Aves 3 different serial numbers, or no serial numbers, or sharing a serial number with Crocodiles. > even if I do also define a property called hasaSerialNumber. I can > only place restrictions on properties in the class definition. If I > then instead write; > > <owl:Class rdf:ID="Aves"> > <rdfs:subClassOf rdf:resource="#Vertebrata"/> > <rdfs:subClassOf> > <owl:Restriction> > <owl:onProperty rdf:resource="#hasaSerialNumber"/> > <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger"> > 174371 > </owl:cardinality> > </owl:Restriction> > </rdfs:subClassOf> > </owl:Class> > > This class will likely be empty, if I understand Owl correctly. That > is because any other taxa which I try to define as Classes, and make > as subClassOf Aves, will certainly have their own serial numbers which > will by definition be different, and therefore outside the restriction > of the parent class. it's worse. You seem to be saying that every instance of a bird has 174371 serial numbers. even forgiving the cardinality mistake, you're on the wrong track. you don't want to use a restriction here, unless you want to talk about the instances, and I presume that here the instances are spatiotemporal particulars such as orville the owl and flossie the sheep. i may be wrong and it may be your intent to classify species rather than organisms: in that the leaf nodes of your taxonomies (H Sapiens, D melanogaster) would be individuals. I would recommend against this. > I can see that what I am really wishing for is that Owl classes might > be more similar to OODBs, where I might define a class of taxonomic > classes, and define slots or attributes on that class, which each > taxonomic class might fill in differently. owl has different semantics from OODBs. > I guess I could define a different property for a serial number at > each level of the taxonomic hierarchy, but this is cumbersome (there > are lots of them, more than the 8 you learned in school) and feels > artificial, as the serial numbers are serial to all taxa, rather than > to just the taxa at a specific level (the orders, or the classes, > say.) An even worse compromise would be to stuff the serial number > and other information about the class itself in comments in the Owl > Class. > Another option that occured to me was to create RDF Element to hold > all of this data that is specifically about the individual Owl Class; > > <rdf:Description rdf:ID="Aves"> > <uni:serialNumberITIS>174371</serialNumberITIS> > <uni:hasaParentClass rdf:resource="#Vertebrata"/> > <uni:hasLimbs> > <rdf:Bag> > <rdf:_1 rdf:resource="#wings"/> > <rdf:_2 rdf:resource="#legs"/> > </rdf:Bag> > </uni:hasLimbs>(etc.) I'm wincing here, sorry. rdf:Bag, ouch. I guess you can just toss out all the good stuff with OWL and use RDFS. I'm not sure what inferences you intend to get with your bags of limbs. They wouldn't propagate over hasParent. Don't use plurals in your anatomical entity names, unless you explicitly intend to denote a collection of legs or wings. Unfortunately it's too late to get Linnaeus to stick to this rule. Far better to use classes, then you can get more expressive Aves SubClassOf hasLimb SOME wing Aves SubClassOf hasLimb SOME leg (sorry to switch syntaxes on you) even better, dispense with hasLimb and use has_part. Or rather re- use, so you can interoperate with other bio-ontologies -- http:// obofoundry.org/ro However, there may be some advantages in turning your "Aves" into instances thus placing them domain of discourse allowing you to make more expressive statements about them. I'd recommend proceeding carefully. More on this below. > and then somehow (?) associate each of the Owl Classes with the more > general and capacious RDF description. But again I feel like this is > a contrivance that is forced on me by the syntax, and certainly not > respresentative of how a human taxonomer would organize her own > thoughts. > > It's certainly possible my frustration with this may stem from my > incomplete (or inaccurate) understanding of the syntax of XML, RDF, > RDF Schema, and Owl, and/or the intent of each of these. There's a lot there, isn't there? My suggestion would be to try and ignore xml, rdf and all syntax issues and learn owl first. Unfortunately I don't have any pointers. Most owl guides put the rdf/ xml in your face > But also it > may come from these each being defined separately, and over a period > of time, where an OOP language (C++, or Ruby, say) is defined all in a > single stroke (I'm simplifying here, I realize.) forget everything you know about OOP then try and tackle OWL. if this proves impossible, learn the consequences of the difference between the CWA and OWA first. then instances in reality vs instances in UML/java. > Also it seems as > though a lot of modelling facility has been sacrificed in order to > make Owl and RDF more easily digestible to reasoners (the software > kind). Less than you may think for your problem above. You can have your serial numbers, as annotation properties. You just can't constrain them, at least with the above representation. There are more serious expressivity constraints. You can't say that species can't interbreed (since "species" is just a taxonomic rank here, indicated with a logically invisible annotation property). And you can't formally state a monophyleticity property for a class. There are some other interesting options. You could take a phylogenetic perspective and treat each taxon as denoting an instance of a spatiotemporal event. E.g. "Aves" as denoting the branching event instance or progenitor organism instance that gave rise to all extant actual Aves instances. Your ontology could be pretty minimal here - organism, birth. You can state your monophyleticity constraints in a DL-safe rules way (if this is important to you), you can account for evolutionary loss (tetrapod has_part limb - mostly) without getting into non-monotonicity.... fun, but diverging from a more traditional taxonomy. > Any instruction or suggestions would be heartily appreciated. been a while since I learned owl & don't have decent learning resources handy. I presume there must a be a "OWL for OO programmers" guide somewhere. Or you can just cheat and skip straight to: http://www.berkeleybop.org/ontologies/obo-all/ncbi_taxonomy/ ncbi_taxonomy.owl [warning - big file] others may have similar translations for their favourite taxonomies. cheers chris > Thanks sincerely,Chris > > >
Received on Thursday, 20 December 2007 05:57:33 UTC