W3C home > Mailing lists > Public > public-bioschemas@w3.org > February 2020

Re: Next step for biodiversity terms

From: Franck Michel <franck.michel@cnrs.fr>
Date: Fri, 14 Feb 2020 15:31:43 +0100
To: Carl Boettiger <cboettig@gmail.com>, "LJ.Garcia" <lj.garcia.co@gmail.com>, Quentin Groom <quentin.groom@plantentuinmeise.be>
Cc: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>, robgur@gmail.com
Message-ID: <603b534f-052a-34c4-5f0f-d7555aeccc8c@cnrs.fr>
Dear Carl, Leyla (+ Quentin who shall certainly be interested in this),

I agree that we should do an effort to better explain how the current 
recommendation aligns with existing vocabularies, specifically Darwin Core.

I'll try to describe how we can solve that. I'm sorry this email is 
pretty long, but I don't know how to be clear and short at the same time ;)

There have been quite some discussions in the beginning wrt. what the 
Taxon term shall refer to: a taxon concept? A taxon name usage? etc. 
Even experts do not always agree on the definition of those terms. So we 
agreed on two principles:
- Bioschemas should not get into experts' debates, but instead remain at 
a general level where there is consensus.
- we should create as little new terms as possible, that is: rely on 
existing schema.org terms when revelant, and "import" existing terms 
from other vocabularies when necessary (this is the Taxon _profile_ part).

A taxon (instance of type Taxon) is associated with an accepted (or 
valid) name (schema:name), 0 to any number of synonyms 
(schema:alternateName), and identifiers from other DBs:

     "@type" : "Taxon",
     "additionalType": [ "dwc:Taxon", 
"http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept" ],
     "*name*": "Delphinapterus leucas (Pallas, 1776)",
     "*alternateName*": [ "Balaena albicans Muller, 1776", "Beluga 
catodon Gray, 1846" ],
     "identifier": [
         {   "@type": "PropertyValue",
             "name": "WoRMS id",
             "propertyID": "https://www.wikidata.org/entity/P850",
             "value": "137115"
         }
     ]

In further discussions 
<https://github.com/BioSchemas/specifications/issues/309>, we agreed 
that modelling only taxa was not sufficient as some databases/portals 
describe scientific names, not taxa. So we started defining the 
TaxonName term 
<https://docs.google.com/spreadsheets/d/1ZZxL6_9VvlDJCXMf_0JnIzyBHExxA6eFIiEDKr6gFqY/edit#gid=1261485211> 
(which is not yet published on the web site, but I'm on it...). This 
term allows to give more specific information about a name.
Hence the creation of two new properties schema:scientificName and 
schema:alternateScientificName which are the counterparts of schema:name 
and schema:alternateName, but with an object of type TaxonTerm insead of 
a string. One would typically use either one couple of of properties or 
the other, by they might be used simultaneously though:

     "*name*": "Delphinapterus leucas (Pallas, 1776)",
     "*alternateName*": [ "Balaena albicans Muller, 1776" ]

     "*scientificName*": {
         "@type" : "TaxonName",
         "name": "Delphinapterus leucas",
         "author": "(Pallas, 1776)"
     },
     "*alternateScientificName*": [
         {   "@type" : "TaxonName",
             "name": "Balaena albicans",
             "author": "Muller, 1776"
         }
     ]

Now, how does this compare with Darwin Core? The pb is that Darwin Core 
RDF terms describe names and names usages, not taxa. In the example you 
provide:
{"taxonID": "ITIS:1000254",
   "scientificName": "Rollandia micropterum",
   "acceptedNameUsageID": "ITIS:562791",
   "taxonomicStatus": "synonym",
   "vernacularName": "Titicaca Grebe"
}

"ITIS:1000254" actually represents a taxon's name which happens to be a 
synonym of "ITIS:562791", therefore the need for acceptedNameUsageID and 
taxonomicStatus.
With the Taxon and TaxonName terms, we could write the same thing by 
first denoting a Taxon with an accepted name (scientificName) and a 
synonym (alternateScientificName), like this:

     "@type" : "Taxon",
     "scientificName": {
         "@type" : "TaxonName",
         "identifier": {
             "@type": "PropertyValue",
             "name": "ITIS id",
             "value": "562791"
         }
     },
     "alternateScientificName": [
         {   "@type" : "TaxonName",
             "name" : "Rollandia micropterum",
             "identifier": {
                 "@type": "PropertyValue",
                 "name": "ITIS id",
                 "value": "1000254"
             }
         }
     ]

Still, this seems a bit cumbersome since you just want to represent 
names but you have to denote a Taxon.
So, one option could be to have a new set of properties 
*hasSynonym/synonymOf *to only denote relationships between TaxonName's 
instances:

     "@type" : "TaxonName",
     "name" : "Rollandia micropterum",
     "identifier": {
         "@type": "PropertyValue",
         "name": "ITIS id",
         "value": "1000254"
     }
     "*synonymOf*": {
         "@type" : "TaxonName",
         "identifier": {
             "@type": "PropertyValue",
             "name": "ITIS id",
             "value": "562791"
     }

What do you think? Would that work for you?

Franck.

Le 13/02/2020 à 19:49, Carl Boettiger a écrit :
> Thanks!
>
> Yes, identifiers are of course the solution, the point is that you 
> need two different identifiers and you need to know which is which.  
> Here's a quick DarwinCore example:
> {
> "taxonID": "ITIS:1000254",
> "scientificName": "Rollandia micropterum",
> "acceptedNameUsageID": "ITIS:562791",
> "taxonomicStatus": "synonym",
> "vernacularName": "Titicaca Grebe"
> }
>
> We don't need `taxonomicStatus` explicitly here, since it is implied 
> by seeing that the accepted ID (acceptedNameUsageID) is not the same 
> thing as the taxonID for this name.  But we do need two identifiers, 
> and we need to know which one is which.  It's not clear to me how the 
> above would be represented in the schema.org <http://schema.org> 
> proposal.  (of course one could say "don't use synonyms! but we may as 
> well then say "don't use scientific names, just use accepted 
> identifiers" but we live in a world that uses scientific names so we 
> need these mechanism that can acknowledge some names are synonyms)
>
> ---
> Carl Boettiger
> http://carlboettiger.info/
>
>
> On Thu, Feb 13, 2020 at 9:58 AM LJ.Garcia <lj.garcia.co@gmail.com 
> <mailto:lj.garcia.co@gmail.com>> wrote:
>
>     Hi Carl, Franck, all,
>
>     @Carl, Franck is probably the best person to point you to
>     discussions/reasons regarding the property names. I am not much
>     aware of how synonyms are handled in Darwin Core so my question
>     could be naïve but... having different identifiers would not help
>     there? Identifiers in Bioschemas should be FAIR, so, even if the
>     label is the same, the identifier should tell you better, would
>     not it? Regarding taxonomic concepts, again, Franck is the one
>     that can answer better.
>     @Franck, if necessary, further properties could be included at
>     this point as the submission to schema.org <http://schema.org>
>     still will take a bit. Also, if not done already, I would suggest
>     to add examples per property so people understand better how to
>     use them.
>
>     Kind regards,
>
>     On Wed, Feb 12, 2020 at 5:18 PM Carl Boettiger <cboettig@gmail.com
>     <mailto:cboettig@gmail.com>> wrote:
>
>         Hi Alasdair,
>
>         Thanks for the update and your work on this.  In the spirit of
>         demonstrating adoption, I think it would be great if the
>         recommendation reflected greater alignment with existing
>         namespaces that are widely used in taxonomy, such as Darwin
>         Core, https://dwc.tdwg.org/terms/#taxon.
>
>         I think this would greatly facilitate adoption. For instance,
>         the current specification provides no mechanism to
>         disambiguate synonyms
>         (https://dwc.tdwg.org/terms/#dwc:taxonomicStatus,
>         https://dwc.tdwg.org/terms/#dwc:acceptedNameUsageID) or
>         taxonomic concepts.  I'm also unclear on the utility of
>         `childTaxon` and `hasDefinedTerm` in the current bioschemas
>         spec.  Apologies if I've missed the boat on these discussions
>         already, but these are certainly barriers to me in using
>         bioschemas over an existing namespace like Darwin Core.  (Also
>         cc'ing Rob Guralnick on this who has far more expertise than I
>         in this area and could speak more broadly to the potential for
>         adoption of
>         https://bioschemas.org/types/Taxon/0.3-RELEASE-2019_11_18/)
>
>         Cheers,
>
>         Carl
>
>
>
>         ---
>         Carl Boettiger
>         http://carlboettiger.info/
>
>
>         On Wed, Feb 12, 2020 at 4:04 AM Gray, Alasdair J G
>         <A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>> wrote:
>
>             Hi Franck,
>
>             Sorry for the slowness of my response, I have been off
>             work for most of January and am now catching up with things.
>
>             The status of getting things added to Schema.org
>             <http://Schema.org> is that we need to demonstrate usage
>             of the deployed markup rather than just deployments of it.
>             This is the focus of the latest ELIXIR sponsored project
>             which will be aiming to demonstrate benefit of the markup
>             within specific areas: rare disease, plants, intrinsically
>             disordered proteins, and toxicology. This work will be
>             running over the next 23 months.
>
>             As such, we should not delay work on other types. So yes,
>             we should progress the work on Taxon and TaxonName.
>
>             The restructuring of the website that we conducted at the
>             tail end of last year was motivated by making it clearer
>             as to which profiles and types are released for general
>             use and which are still under development.
>
>             Best regards
>
>             Alasdair
>
>>             On 11 Feb 2020, at 17:04, LJ.Garcia
>>             <lj.garcia.co@gmail.com <mailto:lj.garcia.co@gmail.com>>
>>             wrote:
>>
>>             Hi,
>>
>>             I am away this week so please allow me some extra days to
>>             have a look to this.
>>
>>             Kind regards,
>>
>>             On Saturday, February 8, 2020, Franck Michel
>>             <franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr>> wrote:
>>
>>                 Dear Alasdair and Leyla,
>>
>>                 I was wondering if you had time to check my last
>>                 reply in issue 309
>>                 <https://github.com/BioSchemas/specifications/issues/309#issuecomment-576247584>.
>>                 I was suggesting that, if endorsing of the Taxon term
>>                 by schema.org <http://schema.org/> is still gonna
>>                 take some time, what about trying to move directly to
>>                 the new couple (Taxon, TaxonName) that we have
>>                 discussed since mid-2019.
>>
>>                 Any thoughts on this?
>>
>>                 Thx,
>>                     Franck.
>>
>>                 -- 
>>
>>                 	Franck MICHEL - CNRS research engineer
>>                 Université Côte d’Azur, CNRS, Inria
>>                 I3S laboratory (UMR 7271)
>>                 franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr>
>>                 - +33 (0)4 8915 4277 	
>>
>
Received on Friday, 14 February 2020 14:32:02 UTC

This archive was generated by hypermail 2.4.0 : Friday, 14 February 2020 14:32:03 UTC