W3C home > Mailing lists > Public > public-bioschemas@w3.org > February 2020

Re: Next step for biodiversity terms

From: Carl Boettiger <cboettig@gmail.com>
Date: Fri, 14 Feb 2020 09:36:48 -0800
Message-ID: <CAN_1p9xs+v09GntnrLSUuSDKj8GGZZvFrKE5kYVzhZuv5i+CvA@mail.gmail.com>
To: Franck Michel <franck.michel@cnrs.fr>
Cc: "LJ.Garcia" <lj.garcia.co@gmail.com>, Quentin Groom <quentin.groom@plantentuinmeise.be>, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>, robgur@gmail.com
Hi Franck,

Thanks for the detailed reply and please let me know if we should move this
discussion over to a GitHub Issue?  Apologies I wasn't up to speed on the
more recent discussions than what is on the bioschemas website.

I'm have reviewed the threads you link and I very much share the sentiments
and objectives you have all voiced there and in this thread (avoid the
debates, leverage existing schema.org vocab whenever possible).
Unfortunately, I'm afraid the new proposals sound quite confusing.  It
seems the proposal to create a new `TaxonName` implicitly means that
`Taxon` is supposed to effectively mean "TaxonConcept"?  I agree
TaxonConcept is not an area of consensus, and it's main purpose is to allow
for discussion in a world where different authorities have
conflicting/overlapping notions of TaxonConcept, and I'm really not sure we
want to go that route.

If Taxon is not meant as "the concept of taxon" then I don't see how it is
different from a TaxonName.  (This is made even more confusing by the fact
that "name" is also a Property of a taxon).   I think this new proposal is
much more confusing than the original!  I acknowledge that the "Concept" of
a Taxon is different than a name, but I think we would be better off not
attempting to define a class/Type for "TaxonConcept" (since afik the
experts haven't done that), and we should let the proposal of "@type":
"schema:Taxon" mean a name, which is how most people see it.  (At it
simplest, we should think of "Taxon" as merely a name/label we apply to an
individual specimen, and not worry about defining the 'class of all such
specimens).



Defining the inverse pair `hasSynonym` & `synonymOf` sounds reasonable,
though I do worry a bit about the complexity.  That is, taxonomically,
`hasSynonym` implies it is property of an "accepted name", while
`synonymOf` sounds like a property of "the synonym", but in English
"synonyms" are symmetric, there's no "accepted" one.  I wonder if
(paralleling the darwin core terms) it would be better to use the optional
property "acceptedName" (and not define an inverse property).

  "@type" : "Taxon",
    "name" : "Rollandia micropterum",
    "@id": "
http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=
1000254"
    "acceptedName": {
                      "@type": "Taxon",
                      "name": "Rollandia microptera",
                      "@id": "
http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=
562791"
                    }

Does that make sense?

Apologies, not trying to open a can of worms here, just aspiring to the
same goals of avoiding debate and re-using existing terms!

---
Carl Boettiger
http://carlboettiger.info/


On Fri, Feb 14, 2020 at 6:32 AM Franck Michel <franck.michel@cnrs.fr> wrote:

> Dear Carl, Leyla (+ Quentin who shall certainly be interested in this),
>
> I agree that we should do an effort to better explain how the current
> recommendation aligns with existing vocabularies, specifically Darwin Core.
>
> I'll try to describe how we can solve that. I'm sorry this email is pretty
> long, but I don't know how to be clear and short at the same time ;)
>
> There have been quite some discussions in the beginning wrt. what the
> Taxon term shall refer to: a taxon concept? A taxon name usage? etc. Even
> experts do not always agree on the definition of those terms. So we agreed
> on two principles:
> - Bioschemas should not get into experts' debates, but instead remain at a
> general level where there is consensus.
> - we should create as little new terms as possible, that is: rely on
> existing schema.org terms when revelant, and "import" existing terms from
> other vocabularies when necessary (this is the Taxon *profile* part).
>
> A taxon (instance of type Taxon) is associated with an accepted (or valid)
> name (schema:name), 0 to any number of synonyms (schema:alternateName), and
> identifiers from other DBs:
>
>     "@type" : "Taxon",
>     "additionalType": [ "dwc:Taxon",
> "http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept"
> <http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept> ],
>     "*name*": "Delphinapterus leucas (Pallas, 1776)",
>     "*alternateName*": [ "Balaena albicans Muller, 1776", "Beluga catodon
> Gray, 1846" ],
>     "identifier": [
>         {   "@type": "PropertyValue",
>             "name": "WoRMS id",
>             "propertyID": "https://www.wikidata.org/entity/P850"
> <https://www.wikidata.org/entity/P850>,
>             "value": "137115"
>         }
>     ]
>
> In further discussions
> <https://github.com/BioSchemas/specifications/issues/309>, we agreed that
> modelling only taxa was not sufficient as some databases/portals describe
> scientific names, not taxa. So we started defining the TaxonName term
> <https://docs.google.com/spreadsheets/d/1ZZxL6_9VvlDJCXMf_0JnIzyBHExxA6eFIiEDKr6gFqY/edit#gid=1261485211>
> (which is not yet published on the web site, but I'm on it...). This term
> allows to give more specific information about a name.
> Hence the creation of two new properties schema:scientificName and
> schema:alternateScientificName which are the counterparts of schema:name
> and schema:alternateName, but with an object of type TaxonTerm insead of a
> string. One would typically use either one couple of of properties or the
> other, by they might be used simultaneously though:
>
>     "*name*": "Delphinapterus leucas (Pallas, 1776)",
>     "*alternateName*": [ "Balaena albicans Muller, 1776" ]
>
>     "*scientificName*": {
>         "@type" : "TaxonName",
>         "name": "Delphinapterus leucas",
>         "author": "(Pallas, 1776)"
>     },
>     "*alternateScientificName*": [
>         {   "@type" : "TaxonName",
>             "name": "Balaena albicans",
>             "author": "Muller, 1776"
>         }
>     ]
>
> Now, how does this compare with Darwin Core? The pb is that Darwin Core
> RDF terms describe names and names usages, not taxa. In the example you
> provide:
> { "taxonID": "ITIS:1000254",
>   "scientificName": "Rollandia micropterum",
>   "acceptedNameUsageID": "ITIS:562791",
>   "taxonomicStatus": "synonym",
>   "vernacularName": "Titicaca Grebe"
> }
>
> "ITIS:1000254" actually represents a taxon's name which happens to be a
> synonym of "ITIS:562791", therefore the need for acceptedNameUsageID and
> taxonomicStatus.
> With the Taxon and TaxonName terms, we could write the same thing by first
> denoting a Taxon with an accepted name (scientificName) and a synonym
> (alternateScientificName), like this:
>
>     "@type" : "Taxon",
>     "scientificName": {
>         "@type" : "TaxonName",
>         "identifier": {
>             "@type": "PropertyValue",
>             "name": "ITIS id",
>             "value": "562791"
>         }
>     },
>     "alternateScientificName": [
>         {   "@type" : "TaxonName",
>             "name" : "Rollandia micropterum",
>             "identifier": {
>                 "@type": "PropertyValue",
>                 "name": "ITIS id",
>                 "value": "1000254"
>             }
>         }
>     ]
>
> Still, this seems a bit cumbersome since you just want to represent names
> but you have to denote a Taxon.
> So, one option could be to have a new set of properties *hasSynonym/synonymOf
> *to only denote relationships between TaxonName's instances:
>
>     "@type" : "TaxonName",
>     "name" : "Rollandia micropterum",
>     "identifier": {
>         "@type": "PropertyValue",
>         "name": "ITIS id",
>         "value": "1000254"
>     }
>     "*synonymOf*": {
>         "@type" : "TaxonName",
>         "identifier": {
>             "@type": "PropertyValue",
>             "name": "ITIS id",
>             "value": "562791"
>     }
>
> What do you think? Would that work for you?
>
> Franck.
>
> Le 13/02/2020 à 19:49, Carl Boettiger a écrit :
>
> Thanks!
>
> Yes, identifiers are of course the solution, the point is that you need
> two different identifiers and you need to know which is which.  Here's a
> quick DarwinCore example:
>
>  {
>
> "taxonID": "ITIS:1000254",
>
> "scientificName": "Rollandia micropterum",
>
> "acceptedNameUsageID": "ITIS:562791",
>
> "taxonomicStatus": "synonym",
>
> "vernacularName": "Titicaca Grebe"
>
> }
>
>
> We don't need `taxonomicStatus` explicitly here, since it is implied by
> seeing that the accepted ID (acceptedNameUsageID) is not the same thing as
> the taxonID for this name.  But we do need two identifiers, and we need to
> know which one is which.  It's not clear to me how the above would be
> represented in the schema.org proposal.  (of course one could say "don't
> use synonyms! but we may as well then say "don't use scientific names, just
> use accepted identifiers" but we live in a world that uses scientific names
> so we need these mechanism that can acknowledge some names are synonyms)
>
> ---
> Carl Boettiger
> http://carlboettiger.info/
>
>
> On Thu, Feb 13, 2020 at 9:58 AM LJ.Garcia <lj.garcia.co@gmail.com> wrote:
>
>> Hi Carl, Franck, all,
>>
>> @Carl, Franck is probably the best person to point you to
>> discussions/reasons regarding the property names. I am not much aware of
>> how synonyms are handled in Darwin Core so my question could be naïve
>> but... having different identifiers would not help there? Identifiers in
>> Bioschemas should be FAIR, so, even if the label is the same, the
>> identifier should tell you better, would not it? Regarding taxonomic
>> concepts, again, Franck is the one that can answer better.
>> @Franck, if necessary, further properties could be included at this point
>> as the submission to schema.org still will take a bit. Also, if not done
>> already, I would suggest to add examples per property so people understand
>> better how to use them.
>>
>> Kind regards,
>>
>> On Wed, Feb 12, 2020 at 5:18 PM Carl Boettiger <cboettig@gmail.com>
>> wrote:
>>
>>> Hi Alasdair,
>>>
>>> Thanks for the update and your work on this.  In the spirit of
>>> demonstrating adoption, I think it would be great if the recommendation
>>> reflected greater alignment with existing namespaces that are widely used
>>> in taxonomy, such as Darwin Core, https://dwc.tdwg.org/terms/#taxon .
>>>
>>> I think this would greatly facilitate adoption.  For instance, the
>>> current specification provides no mechanism to disambiguate synonyms (
>>> https://dwc.tdwg.org/terms/#dwc:taxonomicStatus,
>>> https://dwc.tdwg.org/terms/#dwc:acceptedNameUsageID) or taxonomic
>>> concepts.  I'm also unclear on the utility of `childTaxon` and
>>> `hasDefinedTerm` in the current bioschemas spec.  Apologies if I've missed
>>> the boat on these discussions already, but these are certainly barriers to
>>> me in using bioschemas over an existing namespace like Darwin Core.  (Also
>>> cc'ing Rob Guralnick on this who has far more expertise than I in this area
>>> and could speak more broadly to the potential for adoption of
>>> https://bioschemas.org/types/Taxon/0.3-RELEASE-2019_11_18/)
>>>
>>> Cheers,
>>>
>>> Carl
>>>
>>>
>>>
>>> ---
>>> Carl Boettiger
>>> http://carlboettiger.info/
>>>
>>>
>>> On Wed, Feb 12, 2020 at 4:04 AM Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
>>> wrote:
>>>
>>>> Hi Franck,
>>>>
>>>> Sorry for the slowness of my response, I have been off work for most of
>>>> January and am now catching up with things.
>>>>
>>>> The status of getting things added to Schema.org is that we need to
>>>> demonstrate usage of the deployed markup rather than just deployments of
>>>> it. This is the focus of the latest ELIXIR sponsored project which will be
>>>> aiming to demonstrate benefit of the markup within specific areas: rare
>>>> disease, plants, intrinsically disordered proteins, and toxicology. This
>>>> work will be running over the next 23 months.
>>>>
>>>> As such, we should not delay work on other types. So yes, we should
>>>> progress the work on Taxon and TaxonName.
>>>>
>>>> The restructuring of the website that we conducted at the tail end of
>>>> last year was motivated by making it clearer as to which profiles and types
>>>> are released for general use and which are still under development.
>>>>
>>>> Best regards
>>>>
>>>> Alasdair
>>>>
>>>> On 11 Feb 2020, at 17:04, LJ.Garcia <lj.garcia.co@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am away this week so please allow me some extra days to have a look
>>>> to this.
>>>>
>>>> Kind regards,
>>>>
>>>> On Saturday, February 8, 2020, Franck Michel <franck.michel@cnrs.fr>
>>>> wrote:
>>>>
>>>>> Dear Alasdair and Leyla,
>>>>>
>>>>> I was wondering if you had time to check my last reply in issue 309
>>>>> <https://github.com/BioSchemas/specifications/issues/309#issuecomment-576247584>.
>>>>> I was suggesting that, if endorsing of the Taxon term by schema.org
>>>>> is still gonna take some time, what about trying to move directly to the
>>>>> new couple (Taxon, TaxonName) that we have discussed since mid-2019.
>>>>>
>>>>> Any thoughts on this?
>>>>>
>>>>> Thx,
>>>>>     Franck.
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Franck MICHEL - CNRS research engineer
>>>>> Université Côte d’Azur, CNRS, Inria
>>>>> I3S laboratory (UMR 7271)
>>>>> franck.michel@cnrs.fr - +33 (0)4 8915 4277
>>>>>
>>>>>
>>>>
>>>>
Received on Friday, 14 February 2020 17:37:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 14 February 2020 17:37:16 UTC