Re: Next step for biodiversity terms

Dear all,

To keep track of the full discussion on Taxon vs. TaxonName, I took the 
liberty of copying the last exchanges to issue 
https://github.com/BioSchemas/specifications/issues/309.

*Please continue interacting on this Github issue rather than by 
replying to this email.*

Franck.

Le 15/02/2020 à 16:37, Matt Yoder a écrit :
>
> Hi all,
>
> Just diving into this discussion so my apologies if I'm rehashing 
> things that have been worked out (I'm certain I am), please ignore if so.
>
> What I see from the outset are needs that conflict, sometimes 
> significantly. These fall into two categories as Quentin and others 
> noted: 1) compatibility, i.e. things need to work with concepts that 
> have existed and been implemented and 2) clarification, i.e.the 
> ability to use terms consistently, and therefor comparably in a 
> meaningful way.  I suggest that anything that emerges from this effort 
> be (strongly) biased to 2, even at the partial or significant cost to 
> 1.  I fear that terms that support the confusion between name and 
> concept (which isn't that difficult if you step back) are going to 
> keep our efforts blurred, and interoperability unresolved. I'm seeing 
> precisely this happen in large ongoing efforts that I won't name.  
> Users of terms, importantly (but far from exclusively) the technical 
> teams that implement databases, tools etc. need to work to stop 
> blurring the lines, to get there is going to be a long slow 
> educational process, but decisions by parties like this one can help 
> get us there.
>
> I know of no system that yet currently handles the semantics perfectly 
> (this may be impossible), but I do know several ideas are 
> emerging/have emerged:
>
> 1) If your data model does not distinguish names from concepts, your 
> system is going to whir OK for a while, then see serious problems that 
> frustrate everybody, internal and external. These can be problems as 
> simple as trying to keep track of what software code in your system 
> does what (in fact this is our prime reason for keeping the two 
> separate in our group's efforts).
> 2) There is "synonym" and there is nomenclatural synonymy..  Trying to 
> dance between the two is going to cause problems as in 1)..  We've 
> created NOMEN (https://github.com/SpeciesFileGroup/nomen) to let us 
> isolate and handle the later. It is *OK* for only taxonomists to know 
> about nomenclatural synonymy and its nuances, not everybody has to 
> know everything. We've buried the complexities of using NOMEN in 
> interfaces that taxonomists understand.
> 3) Systems that require nomenclature before concepts can be 
> instantiated are going to fail. For example, users need to capture 
> data about undescribed taxa, and not everyone wants/needs to 
> understand nomenclature.
> 4) Using new terms, even if foreign, can help people begin to 
> understand the distinction between names and concepts. We use "Otu" 
> for taxon concept and "TaxonName" for taxon name.. This term has 
> historical baggage, but curators/scientists get our new use with very 
> little explanation. Do not fear injecting new terms into the world!!!
> 5) Practically, when describing the difference between TaxonName and 
> Otu we ask people to run through a little test:
>   - Is your data about the biology (in the broadest sense) or 
> distribution (etc) of an organism?  Then it should be linked first to 
> an Otu.
>   - Is your data about the name of the organism, specifically as it 
> pertains to the application of code of biological nomenclature? Then 
> your data is linked/added first to a TaxonName.  Note that this data 
> is always objective, i.e. the intent is to capture assertions that 
> have been cited in the literature regardless of their biological 
> interpretation.
> This distinction has immediate consequences, i.e. where in the 
> application, or data to start to look to make changes, or retrieve 
> information.
> 6) There needs to be edges between Otus, and edges between TaxonNames 
> and Otus and edges between TaxonNames and TaxonNames (these defined in 
> NOMEN in our case). If you have a table with both "otu_id and 
> taxon_name_id" in it you're going to have a certain set of things you 
> can't do (I know, we do), yet this is the simple way to get started 
> that most people take.. There are at *least* 5 core relationships 
> between/within the use of TaxonNames and Otus, and numerous 
> relationships that are "subclasses" of these types. Conceptually we've 
> started to tease these out as we want to implement them in our 
> software, see TaxonConceptRelationships.pdf (download to zoom) here 
> https://github.com/SpeciesFileGroup/taxonworks_doc/tree/master/concepts. 
> This is obviously work in progress.
>
> Just my 2c as well!
>
> Cheers,
> Matt
>
>
>
>
> On Sat, Feb 15, 2020 at 4:15 AM Quentin Groom 
> <quentin.groom@plantentuinmeise.be 
> <mailto:quentin.groom@plantentuinmeise.be>> wrote:
>
>     Hi Carl, Franck, Alasdair and all,
>     at least for me, the taxonName term was created to support
>     findability for taxonomic names registries, such as Zoobank,
>     Mycobank and IPNI. As these databases do not keep track of taxa
>     they would be poorly supported by the use of a taxon term in place
>     of a taxonName term. Having said that, I would avoid modelling
>     biological taxonomy and nomenclature in bioschemas, because it's
>     quite a minefield. Therefore, I would keep the relationship
>     between taxon and taxonName as simple as possible. It should be
>     simple enough to support finability of resources on the internet,
>     but it is never going to be rich enough to support an
>     understanding of the nuances of taxonomic concepts and their
>     interrelationships with taxonNames.
>     For me, one would use taxonName when your data relates to the
>     publication and typification of a name, but use taxon when your
>     data is primarily about the traits of the taxon and other
>     biological features. Clearly, there are overlaps. I particularly
>     see either option being useful for specimens, but again it depends
>     on the use case.
>     I'm not sure if this helps the discuss, but that's my 2 cents worth.
>     Quentin
>
>
>
>
>     On Fri, 14 Feb 2020 at 18:37, Carl Boettiger <cboettig@gmail.com
>     <mailto:cboettig@gmail.com>> wrote:
>
>         Hi Franck,
>
>         Thanks for the detailed reply and please let me know if we
>         should move this discussion over to a GitHub Issue?  Apologies
>         I wasn't up to speed on the more recent discussions than what
>         is on the bioschemas website.
>
>         I'm have reviewed the threads you link and I very much share
>         the sentiments and objectives you have all voiced there and in
>         this thread (avoid the debates, leverage existing schema.org
>         <http://schema.org> vocab whenever possible).  Unfortunately,
>         I'm afraid the new proposals sound quite confusing.  It seems
>         the proposal to create a new `TaxonName` implicitly means that
>         `Taxon` is supposed to effectively mean "TaxonConcept"?  I
>         agree TaxonConcept is not an area of consensus, and it's main
>         purpose is to allow for discussion in a world where different
>         authorities have conflicting/overlapping notions of
>         TaxonConcept, and I'm really not sure we want to go that route.
>
>         If Taxon is not meant as "the concept of taxon" then I don't
>         see how it is different from a TaxonName.  (This is made even
>         more confusing by the fact that "name" is also a Property of a
>         taxon).   I think this new proposal is much more confusing
>         than the original!  I acknowledge that the "Concept" of a
>         Taxon is different than a name, but I think we would be better
>         off not attempting to define a class/Type for "TaxonConcept"
>         (since afik the experts haven't done that), and we should let
>         the proposal of "@type": "schema:Taxon" mean a name, which is
>         how most people see it.  (At it simplest, we should think of
>         "Taxon" as merely a name/label we apply to an individual
>         specimen, and not worry about defining the 'class of all such
>         specimens).
>
>
>
>         Defining the inverse pair `hasSynonym` & `synonymOf` sounds
>         reasonable, though I do worry a bit about the complexity. 
>         That is, taxonomically, `hasSynonym` implies it is property of
>         an "accepted name", while `synonymOf` sounds like a property
>         of "the synonym", but in English "synonyms" are symmetric,
>         there's no "accepted" one.  I wonder if (paralleling the
>         darwin core terms) it would be better to use the optional
>         property "acceptedName" (and not define an inverse property).
>
>           "@type" : "Taxon",
>             "name" : "Rollandia micropterum",
>             "@id":
>         "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=1000254"
>         "acceptedName": {
>               "@type": "Taxon",
>               "name": "Rollandia microptera",
>               "@id":
>         "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=562791"
>             }
>
>         Does that make sense?
>
>         Apologies, not trying to open a can of worms here, just
>         aspiring to the same goals of avoiding debate and re-using
>         existing terms!
>
>         ---
>         Carl Boettiger
>         http://carlboettiger.info/
>
>
>         On Fri, Feb 14, 2020 at 6:32 AM Franck Michel
>         <franck.michel@cnrs.fr <mailto:franck.michel@cnrs..fr>> wrote:
>
>             Dear Carl, Leyla (+ Quentin who shall certainly be
>             interested in this),
>
>             I agree that we should do an effort to better explain how
>             the current recommendation aligns with existing
>             vocabularies, specifically Darwin Core.
>
>             I'll try to describe how we can solve that. I'm sorry this
>             email is pretty long, but I don't know how to be clear and
>             short at the same time ;)
>
>             There have been quite some discussions in the beginning
>             wrt. what the Taxon term shall refer to: a taxon concept?
>             A taxon name usage? etc. Even experts do not always agree
>             on the definition of those terms. So we agreed on two
>             principles:
>             - Bioschemas should not get into experts' debates, but
>             instead remain at a general level where there is consensus.
>             - we should create as little new terms as possible, that
>             is: rely on existing schema.org <http://schema.org> terms
>             when revelant, and "import" existing terms from other
>             vocabularies when necessary (this is the Taxon _profile_
>             part).
>
>             A taxon (instance of type Taxon) is associated with an
>             accepted (or valid) name (schema:name), 0 to any number of
>             synonyms (schema:alternateName), and identifiers from
>             other DBs:
>
>                 "@type" : "Taxon",
>                 "additionalType": [ "dwc:Taxon",
>             "http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept"
>             <http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept> ],
>                 "*name*": "Delphinapterus leucas (Pallas, 1776)",
>                 "*alternateName*": [ "Balaena albicans Muller, 1776",
>             "Beluga catodon Gray, 1846" ],
>                 "identifier": [
>                     {   "@type": "PropertyValue",
>                         "name": "WoRMS id",
>                         "propertyID":
>             "https://www.wikidata.org/entity/P850"
>             <https://www.wikidata.org/entity/P850>,
>                         "value": "137115"
>                     }
>                 ]
>
>             In further discussions
>             <https://github.com/BioSchemas/specifications/issues/309>,
>             we agreed that modelling only taxa was not sufficient as
>             some databases/portals describe scientific names, not
>             taxa. So we started defining the TaxonName term
>             <https://docs.google.com/spreadsheets/d/1ZZxL6_9VvlDJCXMf_0JnIzyBHExxA6eFIiEDKr6gFqY/edit#gid=1261485211>
>             (which is not yet published on the web site, but I'm on
>             it...). This term allows to give more specific information
>             about a name.
>             Hence the creation of two new properties
>             schema:scientificName and schema:alternateScientificName
>             which are the counterparts of schema:name and
>             schema:alternateName, but with an object of type TaxonTerm
>             insead of a string. One would typically use either one
>             couple of of properties or the other, by they might be
>             used simultaneously though:
>
>                 "*name*": "Delphinapterus leucas (Pallas, 1776)",
>                 "*alternateName*": [ "Balaena albicans Muller, 1776" ]
>
>                 "*scientificName*": {
>                     "@type" : "TaxonName",
>                     "name": "Delphinapterus leucas",
>                     "author": "(Pallas, 1776)"
>                 },
>                 "*alternateScientificName*": [
>                     {   "@type" : "TaxonName",
>                         "name": "Balaena albicans",
>                         "author": "Muller, 1776"
>                     }
>                 ]
>
>             Now, how does this compare with Darwin Core? The pb is
>             that Darwin Core RDF terms describe names and names
>             usages, not taxa. In the example you provide:
>             {"taxonID": "ITIS:1000254",
>               "scientificName": "Rollandia micropterum",
>               "acceptedNameUsageID": "ITIS:562791",
>               "taxonomicStatus": "synonym",
>               "vernacularName": "Titicaca Grebe"
>             }
>
>             "ITIS:1000254" actually represents a taxon's name which
>             happens to be a synonym of "ITIS:562791", therefore the
>             need for acceptedNameUsageID and taxonomicStatus.
>             With the Taxon and TaxonName terms, we could write the
>             same thing by first denoting a Taxon with an accepted name
>             (scientificName) and a synonym (alternateScientificName),
>             like this:
>
>                 "@type" : "Taxon",
>                 "scientificName": {
>                     "@type" : "TaxonName",
>                     "identifier": {
>                         "@type": "PropertyValue",
>                         "name": "ITIS id",
>                         "value": "562791"
>                     }
>                 },
>                 "alternateScientificName": [
>                     {   "@type" : "TaxonName",
>                         "name" : "Rollandia micropterum",
>                         "identifier": {
>                             "@type": "PropertyValue",
>                             "name": "ITIS id",
>                             "value": "1000254"
>                         }
>                     }
>                 ]
>
>             Still, this seems a bit cumbersome since you just want to
>             represent names but you have to denote a Taxon.
>             So, one option could be to have a new set of properties
>             *hasSynonym/synonymOf *to only denote relationships
>             between TaxonName's instances:
>
>                 "@type" : "TaxonName",
>                 "name" : "Rollandia micropterum",
>                 "identifier": {
>                     "@type": "PropertyValue",
>                     "name": "ITIS id",
>                     "value": "1000254"
>                 }
>                 "*synonymOf*": {
>                     "@type" : "TaxonName",
>                     "identifier": {
>                         "@type": "PropertyValue",
>                         "name": "ITIS id",
>                         "value": "562791"
>                 }
>
>             What do you think? Would that work for you?
>
>             Franck.
>
>             Le 13/02/2020 à 19:49, Carl Boettiger a écrit :
>>             Thanks!
>>
>>             Yes, identifiers are of course the solution, the point is
>>             that you need two different identifiers and you need to
>>             know which is which.  Here's a quick DarwinCore example:
>>             {
>>             "taxonID": "ITIS:1000254",
>>             "scientificName": "Rollandia micropterum",
>>             "acceptedNameUsageID": "ITIS:562791",
>>             "taxonomicStatus": "synonym",
>>             "vernacularName": "Titicaca Grebe"
>>             }
>>
>>             We don't need `taxonomicStatus` explicitly here, since it
>>             is implied by seeing that the accepted ID
>>             (acceptedNameUsageID) is not the same thing as the
>>             taxonID for this name.  But we do need two identifiers,
>>             and we need to know which one is which.  It's not clear
>>             to me how the above would be represented in the
>>             schema.org <http://schema.org> proposal.  (of course one
>>             could say "don't use synonyms! but we may as well then
>>             say "don't use scientific names, just use accepted
>>             identifiers" but we live in a world that uses scientific
>>             names so we need these mechanism that can acknowledge
>>             some names are synonyms)
>>
>>             ---
>>             Carl Boettiger
>>             http://carlboettiger.info/
>>
>>
>>             On Thu, Feb 13, 2020 at 9:58 AM LJ.Garcia
>>             <lj.garcia.co@gmail.com <mailto:lj.garcia.co@gmail.com>>
>>             wrote:
>>
>>                 Hi Carl, Franck, all,
>>
>>                 @Carl, Franck is probably the best person to point
>>                 you to discussions/reasons regarding the property
>>                 names. I am not much aware of how synonyms are
>>                 handled in Darwin Core so my question could be naïve
>>                 but... having different identifiers would not help
>>                 there? Identifiers in Bioschemas should be FAIR, so,
>>                 even if the label is the same, the identifier should
>>                 tell you better, would not it? Regarding taxonomic
>>                 concepts, again, Franck is the one that can answer
>>                 better.
>>                 @Franck, if necessary, further properties could be
>>                 included at this point as the submission to
>>                 schema.org <http://schema.org> still will take a bit.
>>                 Also, if not done already, I would suggest to add
>>                 examples per property so people understand better how
>>                 to use them.
>>
>>                 Kind regards,
>>
>>                 On Wed, Feb 12, 2020 at 5:18 PM Carl Boettiger
>>                 <cboettig@gmail.com <mailto:cboettig@gmail.com>> wrote:
>>
>>                     Hi Alasdair,
>>
>>                     Thanks for the update and your work on this.  In
>>                     the spirit of demonstrating adoption, I think it
>>                     would be great if the recommendation reflected
>>                     greater alignment with existing namespaces that
>>                     are widely used in taxonomy, such as Darwin Core,
>>                     https://dwc.tdwg.org/terms/#taxon .
>>
>>                     I think this would greatly facilitate adoption. 
>>                     For instance, the current specification provides
>>                     no mechanism to disambiguate synonyms
>>                     (https://dwc.tdwg.org/terms/#dwc:taxonomicStatus,
>>                     https://dwc.tdwg.org/terms/#dwc:acceptedNameUsageID)
>>                     or taxonomic concepts.  I'm also unclear on the
>>                     utility of `childTaxon` and `hasDefinedTerm` in
>>                     the current bioschemas spec.  Apologies if I've
>>                     missed the boat on these discussions already, but
>>                     these are certainly barriers to me in using
>>                     bioschemas over an existing namespace like Darwin
>>                     Core.  (Also cc'ing Rob Guralnick on this who has
>>                     far more expertise than I in this area and could
>>                     speak more broadly to the potential for adoption
>>                     of
>>                     https://bioschemas.org/types/Taxon/0.3-RELEASE-2019_11_18/)
>>
>>
>>                     Cheers,
>>
>>                     Carl
>>
>>
>>
>>                     ---
>>                     Carl Boettiger
>>                     http://carlboettiger.info/
>>
>>
>>                     On Wed, Feb 12, 2020 at 4:04 AM Gray, Alasdair J
>>                     G <A.J.G.Gray@hw.ac.uk
>>                     <mailto:A.J.G..Gray@hw.ac.uk>> wrote:
>>
>>                         Hi Franck,
>>
>>                         Sorry for the slowness of my response, I have
>>                         been off work for most of January and am now
>>                         catching up with things.
>>
>>                         The status of getting things added to
>>                         Schema.org <http://Schema.org> is that we
>>                         need to demonstrate usage of the deployed
>>                         markup rather than just deployments of it.
>>                         This is the focus of the latest ELIXIR
>>                         sponsored project which will be aiming to
>>                         demonstrate benefit of the markup within
>>                         specific areas: rare disease, plants,
>>                         intrinsically disordered proteins, and
>>                         toxicology. This work will be running over
>>                         the next 23 months.
>>
>>                         As such, we should not delay work on other
>>                         types. So yes, we should progress the work on
>>                         Taxon and TaxonName.
>>
>>                         The restructuring of the website that we
>>                         conducted at the tail end of last year was
>>                         motivated by making it clearer as to which
>>                         profiles and types are released for general
>>                         use and which are still under development.
>>
>>                         Best regards
>>
>>                         Alasdair
>>
>>>                         On 11 Feb 2020, at 17:04, LJ.Garcia
>>>                         <lj.garcia.co@gmail.com
>>>                         <mailto:lj.garcia.co@gmail.com>> wrote:
>>>
>>>                         Hi,
>>>
>>>                         I am away this week so please allow me some
>>>                         extra days to have a look to this.
>>>
>>>                         Kind regards,
>>>
>>>                         On Saturday, February 8, 2020, Franck Michel
>>>                         <franck.michel@cnrs.fr
>>>                         <mailto:franck.michel@cnrs.fr>> wrote:
>>>
>>>                             Dear Alasdair and Leyla,
>>>
>>>                             I was wondering if you had time to check
>>>                             my last reply in issue 309
>>>                             <https://github.com/BioSchemas/specifications/issues/309#issuecomment-576247584>.
>>>                             I was suggesting that, if endorsing of
>>>                             the Taxon term by schema.org
>>>                             <http://schema.org/> is still gonna take
>>>                             some time, what about trying to move
>>>                             directly to the new couple (Taxon,
>>>                             TaxonName) that we have discussed since
>>>                             mid-2019.
>>>
>>>                             Any thoughts on this?
>>>
>>>                             Thx,
>>>                                 Franck.
>>>
>>>                             -- 
>>>
>>>                              Franck MICHEL - CNRS research engineer
>>>                             Université Côte d’Azur, CNRS, Inria
>>>                             I3S laboratory (UMR 7271)
>>>                             franck.michel@cnrs.fr
>>>                             <mailto:franck.michel@cnrs.fr> - +33
>>>                             (0)4 8915 4277  
>>>
>>

Received on Monday, 17 February 2020 10:28:41 UTC