Re: Next step for biodiversity terms

Hi all,

Just diving into this discussion so my apologies if I'm rehashing things
that have been worked out (I'm certain I am), please ignore if so.

What I see from the outset are needs that conflict, sometimes
significantly. These fall into two categories as Quentin and others noted:
1) compatibility, i.e. things need to work with concepts that have existed
and been implemented and 2) clarification, i.e.the ability to use terms
consistently, and therefor comparably in a meaningful way.  I suggest that
anything that emerges from this effort be (strongly) biased to 2, even at
the partial or significant cost to 1.  I fear that terms that support the
confusion between name and concept (which isn't that difficult if you step
back) are going to keep our efforts blurred, and interoperability
unresolved. I'm seeing precisely this happen in large ongoing efforts that
I won't name.  Users of terms, importantly (but far from exclusively) the
technical teams that implement databases, tools etc. need to work to stop
blurring the lines, to get there is going to be a long slow educational
process, but decisions by parties like this one can help get us there.

I know of no system that yet currently handles the semantics perfectly
(this may be impossible), but I do know several ideas are emerging/have
emerged:

1) If your data model does not distinguish names from concepts, your system
is going to whir OK for a while, then see serious problems that frustrate
everybody, internal and external. These can be problems as simple as trying
to keep track of what software code in your system does what (in fact this
is our prime reason for keeping the two separate in our group's efforts).
2) There is "synonym" and there is nomenclatural synonymy.  Trying to dance
between the two is going to cause problems as in 1).  We've created NOMEN (
https://github.com/SpeciesFileGroup/nomen) to let us isolate and handle the
later. It is *OK* for only taxonomists to know about nomenclatural synonymy
and its nuances, not everybody has to know everything. We've buried the
complexities of using NOMEN in interfaces that taxonomists understand.
3) Systems that require nomenclature before concepts can be instantiated
are going to fail. For example, users need to capture data about
undescribed taxa, and not everyone wants/needs to understand nomenclature.
4) Using new terms, even if foreign, can help people begin to understand
the distinction between names and concepts. We use "Otu" for taxon concept
and "TaxonName" for taxon name. This term has historical baggage, but
curators/scientists get our new use with very little explanation. Do not
fear injecting new terms into the world!!!
5) Practically, when describing the difference between TaxonName and Otu we
ask people to run through a little test:
  - Is your data about the biology (in the broadest sense) or distribution
(etc) of an organism?  Then it should be linked first to an Otu.
  - Is your data about the name of the organism, specifically as it
pertains to the application of code of biological nomenclature? Then your
data is linked/added first to a TaxonName.  Note that this data is always
objective, i.e. the intent is to capture assertions that have been cited in
the literature regardless of their biological interpretation.
This distinction has immediate consequences, i.e. where in the application,
or data to start to look to make changes, or retrieve information.
6) There needs to be edges between Otus, and edges between TaxonNames and
Otus and edges between TaxonNames and TaxonNames (these defined in NOMEN in
our case). If you have a table with both "otu_id and taxon_name_id" in it
you're going to have a certain set of things you can't do (I know, we do),
yet this is the simple way to get started that most people take.. There are
at *least* 5 core relationships between/within the use of TaxonNames and
Otus, and numerous relationships that are "subclasses" of these types.
Conceptually we've started to tease these out as we want to implement them
in our software, see TaxonConceptRelationships.pdf (download to zoom) here
https://github.com/SpeciesFileGroup/taxonworks_doc/tree/master/concepts.
This is obviously work in progress.

Just my 2c as well!

Cheers,
Matt




On Sat, Feb 15, 2020 at 4:15 AM Quentin Groom <
quentin.groom@plantentuinmeise.be> wrote:

> Hi Carl, Franck, Alasdair and all,
> at least for me, the taxonName term was created to support findability for
> taxonomic names registries, such as Zoobank, Mycobank and IPNI. As these
> databases do not keep track of taxa they would be poorly supported by the
> use of a taxon term in place of a taxonName term. Having said that, I would
> avoid modelling biological taxonomy and nomenclature in bioschemas, because
> it's quite a minefield. Therefore, I would keep the relationship between
> taxon and taxonName as simple as possible. It should be simple enough to
> support finability of resources on the internet, but it is never going to
> be rich enough to support an understanding of the nuances of taxonomic
> concepts and their interrelationships with taxonNames.
> For me, one would use taxonName when your data relates to the publication
> and typification of a name, but use taxon when your data is primarily about
> the traits of the taxon and other biological features. Clearly, there are
> overlaps. I particularly see either option being useful for specimens, but
> again it depends on the use case.
> I'm not sure if this helps the discuss, but that's my 2 cents worth.
> Quentin
>
>
>
>
> On Fri, 14 Feb 2020 at 18:37, Carl Boettiger <cboettig@gmail.com> wrote:
>
>> Hi Franck,
>>
>> Thanks for the detailed reply and please let me know if we should move
>> this discussion over to a GitHub Issue?  Apologies I wasn't up to speed on
>> the more recent discussions than what is on the bioschemas website.
>>
>> I'm have reviewed the threads you link and I very much share the
>> sentiments and objectives you have all voiced there and in this thread
>> (avoid the debates, leverage existing schema.org vocab whenever
>> possible).  Unfortunately, I'm afraid the new proposals sound quite
>> confusing.  It seems the proposal to create a new `TaxonName` implicitly
>> means that `Taxon` is supposed to effectively mean "TaxonConcept"?  I agree
>> TaxonConcept is not an area of consensus, and it's main purpose is to allow
>> for discussion in a world where different authorities have
>> conflicting/overlapping notions of TaxonConcept, and I'm really not sure we
>> want to go that route.
>>
>> If Taxon is not meant as "the concept of taxon" then I don't see how it
>> is different from a TaxonName.  (This is made even more confusing by the
>> fact that "name" is also a Property of a taxon).   I think this new
>> proposal is much more confusing than the original!  I acknowledge that the
>> "Concept" of a Taxon is different than a name, but I think we would be
>> better off not attempting to define a class/Type for "TaxonConcept" (since
>> afik the experts haven't done that), and we should let the proposal of
>> "@type": "schema:Taxon" mean a name, which is how most people see it.  (At
>> it simplest, we should think of "Taxon" as merely a name/label we apply to
>> an individual specimen, and not worry about defining the 'class of all such
>> specimens).
>>
>>
>>
>> Defining the inverse pair `hasSynonym` & `synonymOf` sounds reasonable,
>> though I do worry a bit about the complexity.  That is, taxonomically,
>> `hasSynonym` implies it is property of an "accepted name", while
>> `synonymOf` sounds like a property of "the synonym", but in English
>> "synonyms" are symmetric, there's no "accepted" one.  I wonder if
>> (paralleling the darwin core terms) it would be better to use the optional
>> property "acceptedName" (and not define an inverse property).
>>
>>   "@type" : "Taxon",
>>     "name" : "Rollandia micropterum",
>>     "@id": "
>> http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=
>> 1000254"
>>     "acceptedName": {
>>                       "@type": "Taxon",
>>                       "name": "Rollandia microptera",
>>                       "@id": "
>> http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=
>> 562791"
>>                     }
>>
>> Does that make sense?
>>
>> Apologies, not trying to open a can of worms here, just aspiring to the
>> same goals of avoiding debate and re-using existing terms!
>>
>> ---
>> Carl Boettiger
>> http://carlboettiger.info/
>>
>>
>> On Fri, Feb 14, 2020 at 6:32 AM Franck Michel <franck.michel@cnrs.fr>
>> wrote:
>>
>>> Dear Carl, Leyla (+ Quentin who shall certainly be interested in this),
>>>
>>> I agree that we should do an effort to better explain how the current
>>> recommendation aligns with existing vocabularies, specifically Darwin Core.
>>>
>>> I'll try to describe how we can solve that. I'm sorry this email is
>>> pretty long, but I don't know how to be clear and short at the same time ;)
>>>
>>> There have been quite some discussions in the beginning wrt. what the
>>> Taxon term shall refer to: a taxon concept? A taxon name usage? etc. Even
>>> experts do not always agree on the definition of those terms. So we agreed
>>> on two principles:
>>> - Bioschemas should not get into experts' debates, but instead remain at
>>> a general level where there is consensus.
>>> - we should create as little new terms as possible, that is: rely on
>>> existing schema.org terms when revelant, and "import" existing terms
>>> from other vocabularies when necessary (this is the Taxon *profile*
>>> part).
>>>
>>> A taxon (instance of type Taxon) is associated with an accepted (or
>>> valid) name (schema:name), 0 to any number of synonyms
>>> (schema:alternateName), and identifiers from other DBs:
>>>
>>>     "@type" : "Taxon",
>>>     "additionalType": [ "dwc:Taxon",
>>> "http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept"
>>> <http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept> ],
>>>     "*name*": "Delphinapterus leucas (Pallas, 1776)",
>>>     "*alternateName*": [ "Balaena albicans Muller, 1776", "Beluga
>>> catodon Gray, 1846" ],
>>>     "identifier": [
>>>         {   "@type": "PropertyValue",
>>>             "name": "WoRMS id",
>>>             "propertyID": "https://www.wikidata.org/entity/P850"
>>> <https://www.wikidata.org/entity/P850>,
>>>             "value": "137115"
>>>         }
>>>     ]
>>>
>>> In further discussions
>>> <https://github.com/BioSchemas/specifications/issues/309>, we agreed
>>> that modelling only taxa was not sufficient as some databases/portals
>>> describe scientific names, not taxa. So we started defining the TaxonName
>>> term
>>> <https://docs.google.com/spreadsheets/d/1ZZxL6_9VvlDJCXMf_0JnIzyBHExxA6eFIiEDKr6gFqY/edit#gid=1261485211>
>>> (which is not yet published on the web site, but I'm on it...). This term
>>> allows to give more specific information about a name.
>>> Hence the creation of two new properties schema:scientificName and
>>> schema:alternateScientificName which are the counterparts of schema:name
>>> and schema:alternateName, but with an object of type TaxonTerm insead of a
>>> string. One would typically use either one couple of of properties or the
>>> other, by they might be used simultaneously though:
>>>
>>>     "*name*": "Delphinapterus leucas (Pallas, 1776)",
>>>     "*alternateName*": [ "Balaena albicans Muller, 1776" ]
>>>
>>>     "*scientificName*": {
>>>         "@type" : "TaxonName",
>>>         "name": "Delphinapterus leucas",
>>>         "author": "(Pallas, 1776)"
>>>     },
>>>     "*alternateScientificName*": [
>>>         {   "@type" : "TaxonName",
>>>             "name": "Balaena albicans",
>>>             "author": "Muller, 1776"
>>>         }
>>>     ]
>>>
>>> Now, how does this compare with Darwin Core? The pb is that Darwin Core
>>> RDF terms describe names and names usages, not taxa. In the example you
>>> provide:
>>> { "taxonID": "ITIS:1000254",
>>>   "scientificName": "Rollandia micropterum",
>>>   "acceptedNameUsageID": "ITIS:562791",
>>>   "taxonomicStatus": "synonym",
>>>   "vernacularName": "Titicaca Grebe"
>>> }
>>>
>>> "ITIS:1000254" actually represents a taxon's name which happens to be a
>>> synonym of "ITIS:562791", therefore the need for acceptedNameUsageID and
>>> taxonomicStatus.
>>> With the Taxon and TaxonName terms, we could write the same thing by
>>> first denoting a Taxon with an accepted name (scientificName) and a synonym
>>> (alternateScientificName), like this:
>>>
>>>     "@type" : "Taxon",
>>>     "scientificName": {
>>>         "@type" : "TaxonName",
>>>         "identifier": {
>>>             "@type": "PropertyValue",
>>>             "name": "ITIS id",
>>>             "value": "562791"
>>>         }
>>>     },
>>>     "alternateScientificName": [
>>>         {   "@type" : "TaxonName",
>>>             "name" : "Rollandia micropterum",
>>>             "identifier": {
>>>                 "@type": "PropertyValue",
>>>                 "name": "ITIS id",
>>>                 "value": "1000254"
>>>             }
>>>         }
>>>     ]
>>>
>>> Still, this seems a bit cumbersome since you just want to represent
>>> names but you have to denote a Taxon.
>>> So, one option could be to have a new set of properties *hasSynonym/synonymOf
>>> *to only denote relationships between TaxonName's instances:
>>>
>>>     "@type" : "TaxonName",
>>>     "name" : "Rollandia micropterum",
>>>     "identifier": {
>>>         "@type": "PropertyValue",
>>>         "name": "ITIS id",
>>>         "value": "1000254"
>>>     }
>>>     "*synonymOf*": {
>>>         "@type" : "TaxonName",
>>>         "identifier": {
>>>             "@type": "PropertyValue",
>>>             "name": "ITIS id",
>>>             "value": "562791"
>>>     }
>>>
>>> What do you think? Would that work for you?
>>>
>>> Franck.
>>>
>>> Le 13/02/2020 à 19:49, Carl Boettiger a écrit :
>>>
>>> Thanks!
>>>
>>> Yes, identifiers are of course the solution, the point is that you need
>>> two different identifiers and you need to know which is which.  Here's a
>>> quick DarwinCore example:
>>>
>>>  {
>>>
>>> "taxonID": "ITIS:1000254",
>>>
>>> "scientificName": "Rollandia micropterum",
>>>
>>> "acceptedNameUsageID": "ITIS:562791",
>>>
>>> "taxonomicStatus": "synonym",
>>>
>>> "vernacularName": "Titicaca Grebe"
>>>
>>> }
>>>
>>>
>>> We don't need `taxonomicStatus` explicitly here, since it is implied by
>>> seeing that the accepted ID (acceptedNameUsageID) is not the same thing as
>>> the taxonID for this name.  But we do need two identifiers, and we need to
>>> know which one is which.  It's not clear to me how the above would be
>>> represented in the schema.org proposal.  (of course one could say
>>> "don't use synonyms! but we may as well then say "don't use scientific
>>> names, just use accepted identifiers" but we live in a world that uses
>>> scientific names so we need these mechanism that can acknowledge some names
>>> are synonyms)
>>>
>>> ---
>>> Carl Boettiger
>>> http://carlboettiger.info/
>>>
>>>
>>> On Thu, Feb 13, 2020 at 9:58 AM LJ.Garcia <lj.garcia.co@gmail.com>
>>> wrote:
>>>
>>>> Hi Carl, Franck, all,
>>>>
>>>> @Carl, Franck is probably the best person to point you to
>>>> discussions/reasons regarding the property names. I am not much aware of
>>>> how synonyms are handled in Darwin Core so my question could be naïve
>>>> but... having different identifiers would not help there? Identifiers in
>>>> Bioschemas should be FAIR, so, even if the label is the same, the
>>>> identifier should tell you better, would not it? Regarding taxonomic
>>>> concepts, again, Franck is the one that can answer better.
>>>> @Franck, if necessary, further properties could be included at this
>>>> point as the submission to schema.org still will take a bit. Also, if
>>>> not done already, I would suggest to add examples per property so people
>>>> understand better how to use them.
>>>>
>>>> Kind regards,
>>>>
>>>> On Wed, Feb 12, 2020 at 5:18 PM Carl Boettiger <cboettig@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Alasdair,
>>>>>
>>>>> Thanks for the update and your work on this.  In the spirit of
>>>>> demonstrating adoption, I think it would be great if the recommendation
>>>>> reflected greater alignment with existing namespaces that are widely used
>>>>> in taxonomy, such as Darwin Core, https://dwc.tdwg.org/terms/#taxon
>>>>>  .
>>>>>
>>>>> I think this would greatly facilitate adoption.  For instance, the
>>>>> current specification provides no mechanism to disambiguate synonyms (
>>>>> https://dwc.tdwg.org/terms/#dwc:taxonomicStatus,
>>>>> https://dwc.tdwg.org/terms/#dwc:acceptedNameUsageID) or taxonomic
>>>>> concepts.  I'm also unclear on the utility of `childTaxon` and
>>>>> `hasDefinedTerm` in the current bioschemas spec.  Apologies if I've missed
>>>>> the boat on these discussions already, but these are certainly barriers to
>>>>> me in using bioschemas over an existing namespace like Darwin Core.  (Also
>>>>> cc'ing Rob Guralnick on this who has far more expertise than I in this area
>>>>> and could speak more broadly to the potential for adoption of
>>>>> https://bioschemas.org/types/Taxon/0.3-RELEASE-2019_11_18/)
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Carl
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Carl Boettiger
>>>>> http://carlboettiger.info/
>>>>>
>>>>>
>>>>> On Wed, Feb 12, 2020 at 4:04 AM Gray, Alasdair J G <
>>>>> A.J.G.Gray@hw.ac.uk> wrote:
>>>>>
>>>>>> Hi Franck,
>>>>>>
>>>>>> Sorry for the slowness of my response, I have been off work for most
>>>>>> of January and am now catching up with things.
>>>>>>
>>>>>> The status of getting things added to Schema.org is that we need to
>>>>>> demonstrate usage of the deployed markup rather than just deployments of
>>>>>> it. This is the focus of the latest ELIXIR sponsored project which will be
>>>>>> aiming to demonstrate benefit of the markup within specific areas: rare
>>>>>> disease, plants, intrinsically disordered proteins, and toxicology. This
>>>>>> work will be running over the next 23 months.
>>>>>>
>>>>>> As such, we should not delay work on other types. So yes, we should
>>>>>> progress the work on Taxon and TaxonName.
>>>>>>
>>>>>> The restructuring of the website that we conducted at the tail end of
>>>>>> last year was motivated by making it clearer as to which profiles and types
>>>>>> are released for general use and which are still under development.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> Alasdair
>>>>>>
>>>>>> On 11 Feb 2020, at 17:04, LJ.Garcia <lj.garcia.co@gmail.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am away this week so please allow me some extra days to have a look
>>>>>> to this.
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> On Saturday, February 8, 2020, Franck Michel <franck.michel@cnrs.fr>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear Alasdair and Leyla,
>>>>>>>
>>>>>>> I was wondering if you had time to check my last reply in issue 309
>>>>>>> <https://github.com/BioSchemas/specifications/issues/309#issuecomment-576247584>.
>>>>>>> I was suggesting that, if endorsing of the Taxon term by schema.org
>>>>>>> is still gonna take some time, what about trying to move directly to the
>>>>>>> new couple (Taxon, TaxonName) that we have discussed since mid-2019..
>>>>>>>
>>>>>>> Any thoughts on this?
>>>>>>>
>>>>>>> Thx,
>>>>>>>     Franck.
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> Franck MICHEL - CNRS research engineer
>>>>>>> Université Côte d’Azur, CNRS, Inria
>>>>>>> I3S laboratory (UMR 7271)
>>>>>>> franck.michel@cnrs.fr - +33 (0)4 8915 4277
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>

Received on Sunday, 16 February 2020 02:13:48 UTC