Re: Modeling Taxonomic Classifications in a World where a given Species can have many Classifications

Hi Paul,

Sorry for the delay, I wanted to think about this for a bit.

I have thought about creating new predicates for these kinds of
relationships, but I am not sure how accepted these would become.

This draft vocabulary below has them. (See below)

I was thinking that if it gets to a final form it could be changed to use
PURL URI's

It splits off the name and clade parts of my vocabulary into something that
might be easier for others to use.

It also has some extra name features that we are exploring to connect names
to things that contain that names.

The earlier version of this had a number of problems with it, this one
probably still has problems - but not as many.
http://lod.taxonconcept.org/ontology/taxnomen/index.owl

I mocked up URI's in this that can work for years. I was thinking of
setting up a rails app to expose these, but for now they are in the draft
ontology.

I have been having trouble with xsd:gYear.

At the end of the file I have two example taxa and two taxonomic authors.

You can also see the example taxa in my KB

Test of Ochlerotatus triseriatus http://bit.ly/wsz1ww

Test of Procyon lotor http://bit.ly/zmYNke
I think you could link to the Uniprot taxonomic clades using a different
predicate without breaking anything so

so in addition to

inGenus =>
http://lod.taxonconcept.org/ontology/p01/Mammalia/genera.owl#Genus_Procyon
inNCBI_Genus => http://purl.uniprot.org/taxonomy/9653

This is because if you did a SPARQL query for all the mammals in the
family Viverridae
you would get different results.

The NCBI Viverridae has a subfamily within it that is treated as a separate
family in
DBpedia and my ontology. (http://en.wikipedia.org/wiki/Asiatic_linsang)

Back to your original comment, instead of skos:broader skos:narrower I was
thinking of something more specific.

Such as:
 txn_nomen:narrowerTaxon
 txn_nomen:broaderTaxon

Which would be used this way

    <owl:Class rdf:about="
http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae">
        <rdfs:label>Family Felidae</rdfs:label>
        <rdf:type rdf:resource="
http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Mammal_Family"/>
        <txn:commonName>Cats</txn:commonName>
         <txn_nomen:broaderTaxon rdf:resource="
http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Suborder_Feliformia
"/>
        <txn_nomen:narrowerTaxon rdf:resource="
http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Felinae
"/>
        <txn_nomen:narrowerTaxon rdf:resource="
http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Pantherinae
"/>
        <rdfs:isDefinedBy rdf:resource="
http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl"/>
    </owl:Class>


And
txn_nomen:narrowerRank
txn_nomen:broaderRank

Which is used this way

    <owl:Class rdf:about="
http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Rank_Family">
        <rdfs:label xml:lang="en">Rank Family</rdfs:label>
        <rdf:type rdf:resource="
http://lod.taxonconcept.org/ontology/taxnomen/index.owl#TaxonRank"/>
        <txn_nomen:narrowerRank rdf:resource="
http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Subfamily"/>
        <txn_nomen:broaderRank rdf:resource="
http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Superfamily"/>
        <vs:term_status>testing</vs:term_status>
       <rdfs:isDefinedBy rdf:resource="
http://lod.taxonconcept.org/ontology/taxnomen/index.owl#index.owl"/>
    </owl:Class>


Respectfully,

- Pete




On Thu, Jan 26, 2012 at 11:44 AM, Paul Wilton <paul.wilton@ontoba.com>wrote:

> Hi Peter
> doesn't your problem still exist using skos ?
>  - use of skos:broader to infer a hierarchy doesn't stop users making
> sameAs relationships between two concepts at different depths of your
> taxonomy, and thus creating the same problem for you defined in skos rather
> than a class hierarchy ?
>  - also one thing to note with skos is that it is triple heavy having both
> inverseOf relationships and deepish property inheritance (broaderTransitive
> <= semanticRelation   - this means in a owl inferencing triple store you
> will materialise something like n * n * 6 triples (taxon width * depth *
> skos properties)  - which could turn out to be very large if your
> dataset/taxonomy is deep, as I imagine it is quite wide  ( a few million
> species?)...  may not be a problem - but thought worth mentioning
>
> sounds like a great project though :)
> kind regards
> Paul
>
> Paul Wilton, Technical Architect
> Ontoba Ltd <http://www.ontoba.com>
> paul.wilton@ontoba.com
>
>
>
>
> On Thu, Jan 26, 2012 at 3:39 PM, Peter DeVries <pete.devries@gmail.com>wrote:
>
>> Hi Jerven,
>>
>> Thank you for your response. Your reasoning makes sense to me and I like
>> the move to skos:broader and skos:narrower.
>>
>> The problem that I have with subClassing is that some groups have made
>> sameAs links between *txn* concepts and subclassed concepts.
>>
>> This then entails the *txn* concepts within their subClass hierarchy in
>> the LOD.
>>
>> So it is in this regard that I see them as potentially error prone.
>>
>> Ideally this linking should be done with something similar to SameAs but
>> without entailment.
>>
>> For now I think the best alternative is skos:closeMatch.
>>
>> In some use cases these two linked entities can be interpreted as the
>> same thing, but for other uses it might be best to consider them "different
>> things".
>>
>> Until there are more nuanced versions of sameAs I think that
>> skos:closeMatch allow end users to treat these linked entities as they see
>> fit.
>>
>> I am glad you wrote and I would like to follow up in the future.
>>
>> I am currently in Woods Hole MA working with the EoL.org and
>> GlobalNames.org and so there might be some opportunities to that we could
>> run past your group.
>>
>> Also note that the species concepts are still experimental and would
>> probably benefit from your suggestions.
>>
>> Thanks Again,
>>
>> - Pete
>>
>>
>>
>> On Thu, Jan 26, 2012 at 7:46 AM, Jerven Bolleman <
>> jerven.bolleman@isb-sib.ch> wrote:
>>
>>> Hi Peter,
>>>
>>> Its interesting to see this discussion. I would like to give a short
>>> background on why we at UniProt used rdfs:subClassOf relations between
>>> taxons ids.
>>> When this decision was made there where no property paths yet but there
>>> was RDFS inferencing. So the only way one could query for all bacterial
>>> proteins is by having each bacteria species being a subClassOf the bacteria
>>> kingdom thing.
>>> e.g. select ?protein where {?protein :organism ?taxon . ?taxon
>>> rdfs:subClassOf taxon:2}
>>>
>>> Now that there are property paths we no longer need RDFS inferencing to
>>> answer these kinds of questions.
>>> i.e. select ?protein where {?protein :organism ?taxon . ?taxon
>>> skos:broader+ taxon:2}
>>>
>>> We could actually move away from using rdfs:subClassOf. If we have a
>>> good use case of this.
>>> You can actually see that in this release of UniProt where we introduced
>>> skos:narrower into the taxonomy relations next release we will add the
>>> skos:broader links.
>>>
>>> Peter, I am baffled by one statement you made: why does the use of
>>> rdfs:subClassOf relations make correct linking error prone?
>>>
>>> Regards,
>>> Jerven Bolleman
>>>
>>> On Jan 25, 2012, at 11:27 PM, Peter DeVries wrote:
>>>
>>> > Hi,
>>> >
>>> > I have been trying to figure out the best way to deal with the
>>> following problem.
>>> >
>>> > There are entities that we see as "species". (some argue if they are
>>> real things or simply an artificial human construct.)
>>> >
>>> > I think that in general the species themselves see them as real and do
>>> a pretty good job identifying other members of the same species.
>>> >
>>> > Putting that entire debate aside, we still need some way to deal with
>>> the idea of a species as a typological construct so one can say things like.
>>> >
>>> > This species was observed at this geolocation or There have been X
>>> number of bird species observed in this natural area.
>>> >
>>> > Names change over time, and the same name string can be used for
>>> different animal / plant species.
>>> >
>>> > So that is why I created LOD entities like these
>>> >
>>> > http://lod.taxonconcept.org/ses/iuCXz.html  (
>>> http://lod.taxonconcept.org/ses/iuCXz.rdf )
>>> >
>>> > http://lod.taxonconcept.org/ses/v6n7p.html  (
>>> http://lod.taxonconcept.org/ses/v6n7p.rdf )
>>> >
>>> > Since moving to this new model from my earlier GeoSpecies, I have been
>>> trying to figure out how to deal with the following issue.
>>> >
>>> > A species can have multiple classifications. You can see this when you
>>> compare many of the species in DBpedia to those in the NCBI taxonomy
>>> (uniprot, bio2rdf)
>>> >
>>> > Uniprot and Bio2RDF model these as nested subclasses which makes
>>> correct linking error prone.
>>> >
>>> > I think a better way to think of this: there are species and different
>>> groups choose to organise them into classifications differently.
>>> >
>>> > So rather than organize these into nested subclasses, I am thinking
>>> about the following pattern.
>>> >
>>> > Puma concolor
>>> > txn:inGenus txn_mammalia_genera:Genus_Puma
>>> > txn:inFamily txn_mammalia:Family_Felidae
>>> > txn:inOrder  txn_mammalia:Order_Carnivora
>>> >
>>> > You can see this in this file
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/species.owl
>>> >
>>> >     <owl:Class rdf:about="
>>> http://lod.taxonconcept.org/ses/v6n7p#Species">
>>> >      <txn:inClass rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Class_Mammalia
>>> "/>
>>> >      <txn:inOrder rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Order_Carnivora
>>> "/>
>>> >      <txn:inFamily rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae
>>> "/>
>>> >      <txn:inGenus rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/genera.owl#Genus_Puma
>>> "/>
>>> >      <rdfs:isDefinedBy rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/species.owl"/>
>>> >     </owl:Class>
>>> >
>>> > And here http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl
>>> >
>>> >     <owl:Class rdf:about="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae
>>> ">
>>> >         <rdfs:label>Family Felidae</rdfs:label>
>>> >         <rdf:type rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Mammal_Family
>>> "/>
>>> >         <txn:commonName>Cats</txn:commonName>
>>> >         <skos:closeMatch rdf:resource="
>>> http://purl.uniprot.org/taxonomy/9681"/>
>>> >         <skos:closeMatch rdf:resource="
>>> http://dbpedia.org/resource/Felidae"/>
>>> >         <txn:hasWikipediaArticle rdf:resource="
>>> http://en.wikipedia.org/wiki/Felidae"/>
>>> >         <skos:broader rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Suborder_Feliformia
>>> "/>
>>> >         <skos:narrower rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Felinae
>>> "/>
>>> >         <skos:narrower rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Pantherinae
>>> "/>
>>> >         <owl:sameAs rdf:resource="
>>> http://lod.geospecies.org/families/gSvIP"/>
>>> >         <rdfs:isDefinedBy rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl"/>
>>> >     </owl:Class>
>>> >
>>> > This allows SPARQL queries like the one here http://bit.ly/qssZOGbased on my classification without breaking queries where the link is to
>>> DBpedia via a different predicate.
>>> >
>>> > For now I have simply linked these broadly to DBpedia using the
>>> following
>>> >
>>> > <txn:inDBpediaClade rdf:resource="http://dbpedia.org/ontology/Mammal"/>
>>> *I use clade because these don't always match Order => Order etc.
>>> >
>>> > I think this pattern allows a given species to exist in several
>>> classifications, and allow those interested to move up and down the
>>> taxonomy - all without breaking things in the LOD.
>>> >
>>> > I thought I would ask the list what they thought of this before I do
>>> much more?
>>> >
>>> > I was also wondering if it would it be better for me to use
>>> subproperties of skos that I have created in this draft ontology?
>>> >
>>> > http://lod.taxonconcept.org/ontology/taxnomen/index.owl
>>> >
>>> > Such as:
>>> >  txn_nomen:narrowerTaxon
>>> >  txn_nomen:broaderTaxon
>>> >
>>> > Which would be used this way
>>> >
>>> >     <owl:Class rdf:about="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae
>>> ">
>>> >         <rdfs:label>Family Felidae</rdfs:label>
>>> >         <rdf:type rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Mammal_Family
>>> "/>
>>> >         <txn:commonName>Cats</txn:commonName>
>>> >         <skos:closeMatch rdf:resource="
>>> http://purl.uniprot.org/taxonomy/9681"/>
>>> >         <skos:closeMatch rdf:resource="
>>> http://dbpedia.org/resource/Felidae"/>
>>> >         <txn:hasWikipediaArticle rdf:resource="
>>> http://en.wikipedia.org/wiki/Felidae"/>
>>> >         <txn_nomen:broaderTaxon rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Suborder_Feliformia
>>> "/>
>>> >         <txn_nomen:narrowerTaxon rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Felinae
>>> "/>
>>> >         <txn_nomen:narrowerTaxon rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Pantherinae
>>> "/>
>>> >         <owl:sameAs rdf:resource="
>>> http://lod.geospecies.org/families/gSvIP"/>
>>> >         <rdfs:isDefinedBy rdf:resource="
>>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl"/>
>>> >     </owl:Class>
>>> >
>>> >
>>> > And
>>> > txn_nomen:narrowerRank
>>> > txn_nomen:broaderRank
>>> >
>>> > Which is used this way
>>> >
>>> >     <owl:Class rdf:about="
>>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Rank_Family">
>>> >         <rdfs:label xml:lang="en">Rank Family</rdfs:label>
>>> >         <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
>>> >         <rdf:type rdf:resource="
>>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#TaxonRank"/>
>>> >         <txn_nomen:narrowerRank rdf:resource="
>>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Subfamily"/>
>>> >         <txn_nomen:broaderRank rdf:resource="
>>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Superfamily"/>
>>> >         <owl:equivalentProperty rdf:resource="
>>> http://purl.org/ontology/wo/Family"/>
>>> >         <rdfs:seeAlso rdf:resource="
>>> http://en.wikipedia.org/wiki/Family_%28biology%29"/>
>>> >         <rdfs:seeAlso rdf:resource="http://www.bbc.co.uk/nature/family
>>> "/>
>>> >         <vs:term_status>testing</vs:term_status>
>>> >        <rdfs:isDefinedBy rdf:resource="
>>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#index.owl"/>
>>> >     </owl:Class>
>>> >
>>> > Respectfully,
>>> >
>>> > - Pete
>>> >
>>> > P.S. Taxonomic Classification Ontologies like the ones listed above
>>> for mammals will change over time as additional species are discovered and
>>> their phylogeny is better understood.
>>> >         What would be the best practices to handle things like this?
>>> >
>>> >
>>> > --
>>> >
>>> ------------------------------------------------------------------------------------
>>> > Pete DeVries
>>> > Department of Entomology
>>> > University of Wisconsin - Madison
>>> > 445 Russell Laboratories
>>> > 1630 Linden Drive
>>> > Madison, WI 53706
>>> > Email: pdevries@wisc.edu
>>> > TaxonConcept  &  GeoSpecies Knowledge Bases
>>> > A Semantic Web, Linked Open Data  Project
>>> >
>>> --------------------------------------------------------------------------------------
>>>
>>>
>>
>>
>> --
>>
>> ------------------------------------------------------------------------------------
>> Pete DeVries
>> Department of Entomology
>> University of Wisconsin - Madison
>> 445 Russell Laboratories
>> 1630 Linden Drive
>> Madison, WI 53706
>> Email: pdevries@wisc.edu
>> TaxonConcept <http://www.taxonconcept.org/>  &  GeoSpecies<http://about.geospecies.org/> Knowledge
>> Bases
>> A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
>>
>> --------------------------------------------------------------------------------------
>>
>
>


-- 
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &
GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
--------------------------------------------------------------------------------------

Received on Thursday, 2 February 2012 02:49:00 UTC