Re: Using RDF to describe biological taxonomy.

> I wanted avoid creating
> my own vocabulary, which is why I am using rdfs:comment for the rank
> and rdfs:title for the latin name.  The more I think about it, the
> more I'm convicted to create a vocabulary.

Making use of existing vocabulary is certainly a good idea, but as soon
as explanations such as "dc:title stores the latin name" are required
you may be going a bit too far :-)

There is an RDF version of the taxonomy data distributed along with the
Uniprot protein database. It's based on the NCBI taxonomy database, but
may nevertheless be of interest to you. The taxa are stored in a simple
RDF file, whereas classes (Taxon, Rank), properties (scientificName,
commonName, parent, child etc.) and instances (Kingdom, Phylum etc.)
used to describe the taxa are defined in a separate OWL file.

One interesting issue is whether individual taxa should be represented
as classes or as instances. Since they are organized hierarchically
using classes would seem like a natural solution as it removes the need
for introducing custom parent-child properties. On the other hand we
often need to reference individual taxa as in "protein x occurs in
species y", which it seems works better if taxa are considered
instances. OWL Full, I believe, removes the distinction between classes
and individuals, but complicates things a lot otherwise...

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
  xmlns="http://uniprot.org/ontology/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xml:base="http://uniprot.org/taxonomy/"
>
  <rdf:Description rdf:about="9606">
    <rdf:type rdf:resource="http://uniprot.org/ontology/Taxon"/>
    <rank rdf:resource="http://uniprot.org/ontology/Species"/>
    <mnemonic>HUMAN</mnemonic>
    <scientificName>Homo sapiens</scientificName>
    <commonName>Human</commonName>
    <parent rdf:resource="9605"/>
  </rdf:Description>
  ...
</rdf:RDF>

Hope this is of use...

Received on Monday, 29 March 2004 04:52:02 UTC