Re: Modeling Taxonomic Classifications in a World where a given Species can have many Classifications

On 1/26/12 7:46 AM, Jerven Bolleman wrote:
> Hi Peter,
>
> Its interesting to see this discussion. I would like to give a short background on why we at UniProt used rdfs:subClassOf relations between taxons ids.
> When this decision was made there where no property paths yet but there was RDFS inferencing. So the only way one could query for all bacterial proteins is by having each bacteria species being a subClassOf the bacteria kingdom thing.
> e.g. select ?protein where {?protein :organism ?taxon . ?taxon rdfs:subClassOf taxon:2}
>
> Now that there are property paths we no longer need RDFS inferencing to answer these kinds of questions.
> i.e. select ?protein where {?protein :organism ?taxon . ?taxon skos:broader+ taxon:2}
>
> We could actually move away from using rdfs:subClassOf.

Please don't.

Utility of subsumption isn't dead. Far from it.

> If we have a good use case of this.

As per comment above, I don't see justification. Of course, I would be 
happy to be convinced otherwise.

> You can actually see that in this release of UniProt where we introduced skos:narrower into the taxonomy relations next release we will add the skos:broader links.

Great!

Property Path expressions and Reasoning don't have to be mutually 
exclusive.

In my eyes, the magic is still all about Linked Data tapestry comprised 
of semantically rich relations that are human and machine comprehensible :-)

Kingsley

> Peter, I am baffled by one statement you made: why does the use of rdfs:subClassOf relations make correct linking error prone?
>
> Regards,
> Jerven Bolleman
>
> On Jan 25, 2012, at 11:27 PM, Peter DeVries wrote:
>
>> Hi,
>>
>> I have been trying to figure out the best way to deal with the following problem.
>>
>> There are entities that we see as "species". (some argue if they are real things or simply an artificial human construct.)
>>
>> I think that in general the species themselves see them as real and do a pretty good job identifying other members of the same species.
>>
>> Putting that entire debate aside, we still need some way to deal with the idea of a species as a typological construct so one can say things like.
>>
>> This species was observed at this geolocation or There have been X number of bird species observed in this natural area.
>>
>> Names change over time, and the same name string can be used for different animal / plant species.
>>
>> So that is why I created LOD entities like these
>>
>> http://lod.taxonconcept.org/ses/iuCXz.html  ( http://lod.taxonconcept.org/ses/iuCXz.rdf )
>>
>> http://lod.taxonconcept.org/ses/v6n7p.html  (http://lod.taxonconcept.org/ses/v6n7p.rdf )
>>
>> Since moving to this new model from my earlier GeoSpecies, I have been trying to figure out how to deal with the following issue.
>>
>> A species can have multiple classifications. You can see this when you compare many of the species in DBpedia to those in the NCBI taxonomy (uniprot, bio2rdf)
>>
>> Uniprot and Bio2RDF model these as nested subclasses which makes correct linking error prone.
>>
>> I think a better way to think of this: there are species and different groups choose to organise them into classifications differently.
>>
>> So rather than organize these into nested subclasses, I am thinking about the following pattern.
>>
>> Puma concolor
>> txn:inGenus txn_mammalia_genera:Genus_Puma
>> txn:inFamily txn_mammalia:Family_Felidae
>> txn:inOrder  txn_mammalia:Order_Carnivora
>>
>> You can see this in this file http://lod.taxonconcept.org/ontology/p01/Mammalia/species.owl
>>
>>      <owl:Class rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Species">
>>       <txn:inClass rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Class_Mammalia"/>
>>       <txn:inOrder rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Order_Carnivora"/>
>>       <txn:inFamily rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae"/>
>>       <txn:inGenus rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/genera.owl#Genus_Puma"/>
>>       <rdfs:isDefinedBy rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/species.owl"/>
>>      </owl:Class>
>>
>> And here http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl
>>
>>      <owl:Class rdf:about="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae">
>>          <rdfs:label>Family Felidae</rdfs:label>
>>          <rdf:type rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Mammal_Family"/>
>>          <txn:commonName>Cats</txn:commonName>
>>          <skos:closeMatch rdf:resource="http://purl.uniprot.org/taxonomy/9681"/>
>>          <skos:closeMatch rdf:resource="http://dbpedia.org/resource/Felidae"/>
>>          <txn:hasWikipediaArticle rdf:resource="http://en.wikipedia.org/wiki/Felidae"/>
>>          <skos:broader rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Suborder_Feliformia"/>
>>          <skos:narrower rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Felinae"/>
>>          <skos:narrower rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Pantherinae"/>
>>          <owl:sameAs rdf:resource="http://lod.geospecies.org/families/gSvIP"/>
>>          <rdfs:isDefinedBy rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl"/>
>>      </owl:Class>
>>
>> This allows SPARQL queries like the one here http://bit.ly/qssZOG based on my classification without breaking queries where the link is to DBpedia via a different predicate.
>>
>> For now I have simply linked these broadly to DBpedia using the following
>>
>> <txn:inDBpediaClade rdf:resource="http://dbpedia.org/ontology/Mammal"/>  *I use clade because these don't always match Order =>  Order etc.
>>
>> I think this pattern allows a given species to exist in several classifications, and allow those interested to move up and down the taxonomy - all without breaking things in the LOD.
>>
>> I thought I would ask the list what they thought of this before I do much more?
>>
>> I was also wondering if it would it be better for me to use subproperties of skos that I have created in this draft ontology?
>>
>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl
>>
>> Such as:
>>   txn_nomen:narrowerTaxon
>>   txn_nomen:broaderTaxon
>>
>> Which would be used this way
>>
>>      <owl:Class rdf:about="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae">
>>          <rdfs:label>Family Felidae</rdfs:label>
>>          <rdf:type rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Mammal_Family"/>
>>          <txn:commonName>Cats</txn:commonName>
>>          <skos:closeMatch rdf:resource="http://purl.uniprot.org/taxonomy/9681"/>
>>          <skos:closeMatch rdf:resource="http://dbpedia.org/resource/Felidae"/>
>>          <txn:hasWikipediaArticle rdf:resource="http://en.wikipedia.org/wiki/Felidae"/>
>>          <txn_nomen:broaderTaxon rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Suborder_Feliformia"/>
>>          <txn_nomen:narrowerTaxon rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Felinae"/>
>>          <txn_nomen:narrowerTaxon rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Pantherinae"/>
>>          <owl:sameAs rdf:resource="http://lod.geospecies.org/families/gSvIP"/>
>>          <rdfs:isDefinedBy rdf:resource="http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl"/>
>>      </owl:Class>
>>
>>
>> And
>> txn_nomen:narrowerRank
>> txn_nomen:broaderRank
>>
>> Which is used this way
>>
>>      <owl:Class rdf:about="http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Rank_Family">
>>          <rdfs:label xml:lang="en">Rank Family</rdfs:label>
>>          <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
>>          <rdf:type rdf:resource="http://lod.taxonconcept.org/ontology/taxnomen/index.owl#TaxonRank"/>
>>          <txn_nomen:narrowerRank rdf:resource="http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Subfamily"/>
>>          <txn_nomen:broaderRank rdf:resource="http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Superfamily"/>
>>          <owl:equivalentProperty rdf:resource="http://purl.org/ontology/wo/Family"/>
>>          <rdfs:seeAlso rdf:resource="http://en.wikipedia.org/wiki/Family_%28biology%29"/>
>>          <rdfs:seeAlso rdf:resource="http://www.bbc.co.uk/nature/family"/>
>>          <vs:term_status>testing</vs:term_status>
>>         <rdfs:isDefinedBy rdf:resource="http://lod.taxonconcept.org/ontology/taxnomen/index.owl#index.owl"/>
>>      </owl:Class>
>>
>> Respectfully,
>>
>> - Pete
>>
>> P.S. Taxonomic Classification Ontologies like the ones listed above for mammals will change over time as additional species are discovered and their phylogeny is better understood.
>>          What would be the best practices to handle things like this?
>>
>>
>> -- 
>> ------------------------------------------------------------------------------------
>> Pete DeVries
>> Department of Entomology
>> University of Wisconsin - Madison
>> 445 Russell Laboratories
>> 1630 Linden Drive
>> Madison, WI 53706
>> Email: pdevries@wisc.edu
>> TaxonConcept&   GeoSpecies Knowledge Bases
>> A Semantic Web, Linked Open Data  Project
>> --------------------------------------------------------------------------------------
>
>


-- 

Regards,

Kingsley Idehen	
Founder&  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Received on Thursday, 26 January 2012 15:02:44 UTC