- From: Franck Michel <franck.michel@cnrs.fr>
- Date: Mon, 17 Feb 2020 11:27:21 +0100
- To: Matt Yoder <diapriid@gmail.com>, Quentin Groom <quentin.groom@plantentuinmeise.be>
- Cc: Carl Boettiger <cboettig@gmail.com>, "LJ.Garcia" <lj.garcia.co@gmail.com>, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>, Robert Guralnick <robgur@gmail.com>
- Message-ID: <199e1fd1-f689-513f-b13d-3244b9cab661@cnrs.fr>
Dear all,
To keep track of the full discussion on Taxon vs. TaxonName, I took the
liberty of copying the last exchanges to issue
https://github.com/BioSchemas/specifications/issues/309.
*Please continue interacting on this Github issue rather than by
replying to this email.*
Franck.
Le 15/02/2020 à 16:37, Matt Yoder a écrit :
>
> Hi all,
>
> Just diving into this discussion so my apologies if I'm rehashing
> things that have been worked out (I'm certain I am), please ignore if so.
>
> What I see from the outset are needs that conflict, sometimes
> significantly. These fall into two categories as Quentin and others
> noted: 1) compatibility, i.e. things need to work with concepts that
> have existed and been implemented and 2) clarification, i.e.the
> ability to use terms consistently, and therefor comparably in a
> meaningful way. I suggest that anything that emerges from this effort
> be (strongly) biased to 2, even at the partial or significant cost to
> 1. I fear that terms that support the confusion between name and
> concept (which isn't that difficult if you step back) are going to
> keep our efforts blurred, and interoperability unresolved. I'm seeing
> precisely this happen in large ongoing efforts that I won't name.
> Users of terms, importantly (but far from exclusively) the technical
> teams that implement databases, tools etc. need to work to stop
> blurring the lines, to get there is going to be a long slow
> educational process, but decisions by parties like this one can help
> get us there.
>
> I know of no system that yet currently handles the semantics perfectly
> (this may be impossible), but I do know several ideas are
> emerging/have emerged:
>
> 1) If your data model does not distinguish names from concepts, your
> system is going to whir OK for a while, then see serious problems that
> frustrate everybody, internal and external. These can be problems as
> simple as trying to keep track of what software code in your system
> does what (in fact this is our prime reason for keeping the two
> separate in our group's efforts).
> 2) There is "synonym" and there is nomenclatural synonymy.. Trying to
> dance between the two is going to cause problems as in 1).. We've
> created NOMEN (https://github.com/SpeciesFileGroup/nomen) to let us
> isolate and handle the later. It is *OK* for only taxonomists to know
> about nomenclatural synonymy and its nuances, not everybody has to
> know everything. We've buried the complexities of using NOMEN in
> interfaces that taxonomists understand.
> 3) Systems that require nomenclature before concepts can be
> instantiated are going to fail. For example, users need to capture
> data about undescribed taxa, and not everyone wants/needs to
> understand nomenclature.
> 4) Using new terms, even if foreign, can help people begin to
> understand the distinction between names and concepts. We use "Otu"
> for taxon concept and "TaxonName" for taxon name.. This term has
> historical baggage, but curators/scientists get our new use with very
> little explanation. Do not fear injecting new terms into the world!!!
> 5) Practically, when describing the difference between TaxonName and
> Otu we ask people to run through a little test:
> - Is your data about the biology (in the broadest sense) or
> distribution (etc) of an organism? Then it should be linked first to
> an Otu.
> - Is your data about the name of the organism, specifically as it
> pertains to the application of code of biological nomenclature? Then
> your data is linked/added first to a TaxonName. Note that this data
> is always objective, i.e. the intent is to capture assertions that
> have been cited in the literature regardless of their biological
> interpretation.
> This distinction has immediate consequences, i.e. where in the
> application, or data to start to look to make changes, or retrieve
> information.
> 6) There needs to be edges between Otus, and edges between TaxonNames
> and Otus and edges between TaxonNames and TaxonNames (these defined in
> NOMEN in our case). If you have a table with both "otu_id and
> taxon_name_id" in it you're going to have a certain set of things you
> can't do (I know, we do), yet this is the simple way to get started
> that most people take.. There are at *least* 5 core relationships
> between/within the use of TaxonNames and Otus, and numerous
> relationships that are "subclasses" of these types. Conceptually we've
> started to tease these out as we want to implement them in our
> software, see TaxonConceptRelationships.pdf (download to zoom) here
> https://github.com/SpeciesFileGroup/taxonworks_doc/tree/master/concepts.
> This is obviously work in progress.
>
> Just my 2c as well!
>
> Cheers,
> Matt
>
>
>
>
> On Sat, Feb 15, 2020 at 4:15 AM Quentin Groom
> <quentin.groom@plantentuinmeise.be
> <mailto:quentin.groom@plantentuinmeise.be>> wrote:
>
> Hi Carl, Franck, Alasdair and all,
> at least for me, the taxonName term was created to support
> findability for taxonomic names registries, such as Zoobank,
> Mycobank and IPNI. As these databases do not keep track of taxa
> they would be poorly supported by the use of a taxon term in place
> of a taxonName term. Having said that, I would avoid modelling
> biological taxonomy and nomenclature in bioschemas, because it's
> quite a minefield. Therefore, I would keep the relationship
> between taxon and taxonName as simple as possible. It should be
> simple enough to support finability of resources on the internet,
> but it is never going to be rich enough to support an
> understanding of the nuances of taxonomic concepts and their
> interrelationships with taxonNames.
> For me, one would use taxonName when your data relates to the
> publication and typification of a name, but use taxon when your
> data is primarily about the traits of the taxon and other
> biological features. Clearly, there are overlaps. I particularly
> see either option being useful for specimens, but again it depends
> on the use case.
> I'm not sure if this helps the discuss, but that's my 2 cents worth.
> Quentin
>
>
>
>
> On Fri, 14 Feb 2020 at 18:37, Carl Boettiger <cboettig@gmail.com
> <mailto:cboettig@gmail.com>> wrote:
>
> Hi Franck,
>
> Thanks for the detailed reply and please let me know if we
> should move this discussion over to a GitHub Issue? Apologies
> I wasn't up to speed on the more recent discussions than what
> is on the bioschemas website.
>
> I'm have reviewed the threads you link and I very much share
> the sentiments and objectives you have all voiced there and in
> this thread (avoid the debates, leverage existing schema.org
> <http://schema.org> vocab whenever possible). Unfortunately,
> I'm afraid the new proposals sound quite confusing. It seems
> the proposal to create a new `TaxonName` implicitly means that
> `Taxon` is supposed to effectively mean "TaxonConcept"? I
> agree TaxonConcept is not an area of consensus, and it's main
> purpose is to allow for discussion in a world where different
> authorities have conflicting/overlapping notions of
> TaxonConcept, and I'm really not sure we want to go that route.
>
> If Taxon is not meant as "the concept of taxon" then I don't
> see how it is different from a TaxonName. (This is made even
> more confusing by the fact that "name" is also a Property of a
> taxon). I think this new proposal is much more confusing
> than the original! I acknowledge that the "Concept" of a
> Taxon is different than a name, but I think we would be better
> off not attempting to define a class/Type for "TaxonConcept"
> (since afik the experts haven't done that), and we should let
> the proposal of "@type": "schema:Taxon" mean a name, which is
> how most people see it. (At it simplest, we should think of
> "Taxon" as merely a name/label we apply to an individual
> specimen, and not worry about defining the 'class of all such
> specimens).
>
>
>
> Defining the inverse pair `hasSynonym` & `synonymOf` sounds
> reasonable, though I do worry a bit about the complexity.
> That is, taxonomically, `hasSynonym` implies it is property of
> an "accepted name", while `synonymOf` sounds like a property
> of "the synonym", but in English "synonyms" are symmetric,
> there's no "accepted" one. I wonder if (paralleling the
> darwin core terms) it would be better to use the optional
> property "acceptedName" (and not define an inverse property).
>
> "@type" : "Taxon",
> "name" : "Rollandia micropterum",
> "@id":
> "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=1000254"
> "acceptedName": {
> "@type": "Taxon",
> "name": "Rollandia microptera",
> "@id":
> "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=562791"
> }
>
> Does that make sense?
>
> Apologies, not trying to open a can of worms here, just
> aspiring to the same goals of avoiding debate and re-using
> existing terms!
>
> ---
> Carl Boettiger
> http://carlboettiger.info/
>
>
> On Fri, Feb 14, 2020 at 6:32 AM Franck Michel
> <franck.michel@cnrs.fr <mailto:franck.michel@cnrs..fr>> wrote:
>
> Dear Carl, Leyla (+ Quentin who shall certainly be
> interested in this),
>
> I agree that we should do an effort to better explain how
> the current recommendation aligns with existing
> vocabularies, specifically Darwin Core.
>
> I'll try to describe how we can solve that. I'm sorry this
> email is pretty long, but I don't know how to be clear and
> short at the same time ;)
>
> There have been quite some discussions in the beginning
> wrt. what the Taxon term shall refer to: a taxon concept?
> A taxon name usage? etc. Even experts do not always agree
> on the definition of those terms. So we agreed on two
> principles:
> - Bioschemas should not get into experts' debates, but
> instead remain at a general level where there is consensus.
> - we should create as little new terms as possible, that
> is: rely on existing schema.org <http://schema.org> terms
> when revelant, and "import" existing terms from other
> vocabularies when necessary (this is the Taxon _profile_
> part).
>
> A taxon (instance of type Taxon) is associated with an
> accepted (or valid) name (schema:name), 0 to any number of
> synonyms (schema:alternateName), and identifiers from
> other DBs:
>
> "@type" : "Taxon",
> "additionalType": [ "dwc:Taxon",
> "http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept"
> <http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept> ],
> "*name*": "Delphinapterus leucas (Pallas, 1776)",
> "*alternateName*": [ "Balaena albicans Muller, 1776",
> "Beluga catodon Gray, 1846" ],
> "identifier": [
> { "@type": "PropertyValue",
> "name": "WoRMS id",
> "propertyID":
> "https://www.wikidata.org/entity/P850"
> <https://www.wikidata.org/entity/P850>,
> "value": "137115"
> }
> ]
>
> In further discussions
> <https://github.com/BioSchemas/specifications/issues/309>,
> we agreed that modelling only taxa was not sufficient as
> some databases/portals describe scientific names, not
> taxa. So we started defining the TaxonName term
> <https://docs.google.com/spreadsheets/d/1ZZxL6_9VvlDJCXMf_0JnIzyBHExxA6eFIiEDKr6gFqY/edit#gid=1261485211>
> (which is not yet published on the web site, but I'm on
> it...). This term allows to give more specific information
> about a name.
> Hence the creation of two new properties
> schema:scientificName and schema:alternateScientificName
> which are the counterparts of schema:name and
> schema:alternateName, but with an object of type TaxonTerm
> insead of a string. One would typically use either one
> couple of of properties or the other, by they might be
> used simultaneously though:
>
> "*name*": "Delphinapterus leucas (Pallas, 1776)",
> "*alternateName*": [ "Balaena albicans Muller, 1776" ]
>
> "*scientificName*": {
> "@type" : "TaxonName",
> "name": "Delphinapterus leucas",
> "author": "(Pallas, 1776)"
> },
> "*alternateScientificName*": [
> { "@type" : "TaxonName",
> "name": "Balaena albicans",
> "author": "Muller, 1776"
> }
> ]
>
> Now, how does this compare with Darwin Core? The pb is
> that Darwin Core RDF terms describe names and names
> usages, not taxa. In the example you provide:
> {"taxonID": "ITIS:1000254",
> "scientificName": "Rollandia micropterum",
> "acceptedNameUsageID": "ITIS:562791",
> "taxonomicStatus": "synonym",
> "vernacularName": "Titicaca Grebe"
> }
>
> "ITIS:1000254" actually represents a taxon's name which
> happens to be a synonym of "ITIS:562791", therefore the
> need for acceptedNameUsageID and taxonomicStatus.
> With the Taxon and TaxonName terms, we could write the
> same thing by first denoting a Taxon with an accepted name
> (scientificName) and a synonym (alternateScientificName),
> like this:
>
> "@type" : "Taxon",
> "scientificName": {
> "@type" : "TaxonName",
> "identifier": {
> "@type": "PropertyValue",
> "name": "ITIS id",
> "value": "562791"
> }
> },
> "alternateScientificName": [
> { "@type" : "TaxonName",
> "name" : "Rollandia micropterum",
> "identifier": {
> "@type": "PropertyValue",
> "name": "ITIS id",
> "value": "1000254"
> }
> }
> ]
>
> Still, this seems a bit cumbersome since you just want to
> represent names but you have to denote a Taxon.
> So, one option could be to have a new set of properties
> *hasSynonym/synonymOf *to only denote relationships
> between TaxonName's instances:
>
> "@type" : "TaxonName",
> "name" : "Rollandia micropterum",
> "identifier": {
> "@type": "PropertyValue",
> "name": "ITIS id",
> "value": "1000254"
> }
> "*synonymOf*": {
> "@type" : "TaxonName",
> "identifier": {
> "@type": "PropertyValue",
> "name": "ITIS id",
> "value": "562791"
> }
>
> What do you think? Would that work for you?
>
> Franck.
>
> Le 13/02/2020 à 19:49, Carl Boettiger a écrit :
>> Thanks!
>>
>> Yes, identifiers are of course the solution, the point is
>> that you need two different identifiers and you need to
>> know which is which. Here's a quick DarwinCore example:
>> {
>> "taxonID": "ITIS:1000254",
>> "scientificName": "Rollandia micropterum",
>> "acceptedNameUsageID": "ITIS:562791",
>> "taxonomicStatus": "synonym",
>> "vernacularName": "Titicaca Grebe"
>> }
>>
>> We don't need `taxonomicStatus` explicitly here, since it
>> is implied by seeing that the accepted ID
>> (acceptedNameUsageID) is not the same thing as the
>> taxonID for this name. But we do need two identifiers,
>> and we need to know which one is which. It's not clear
>> to me how the above would be represented in the
>> schema.org <http://schema.org> proposal. (of course one
>> could say "don't use synonyms! but we may as well then
>> say "don't use scientific names, just use accepted
>> identifiers" but we live in a world that uses scientific
>> names so we need these mechanism that can acknowledge
>> some names are synonyms)
>>
>> ---
>> Carl Boettiger
>> http://carlboettiger.info/
>>
>>
>> On Thu, Feb 13, 2020 at 9:58 AM LJ.Garcia
>> <lj.garcia.co@gmail.com <mailto:lj.garcia.co@gmail.com>>
>> wrote:
>>
>> Hi Carl, Franck, all,
>>
>> @Carl, Franck is probably the best person to point
>> you to discussions/reasons regarding the property
>> names. I am not much aware of how synonyms are
>> handled in Darwin Core so my question could be naïve
>> but... having different identifiers would not help
>> there? Identifiers in Bioschemas should be FAIR, so,
>> even if the label is the same, the identifier should
>> tell you better, would not it? Regarding taxonomic
>> concepts, again, Franck is the one that can answer
>> better.
>> @Franck, if necessary, further properties could be
>> included at this point as the submission to
>> schema.org <http://schema.org> still will take a bit.
>> Also, if not done already, I would suggest to add
>> examples per property so people understand better how
>> to use them.
>>
>> Kind regards,
>>
>> On Wed, Feb 12, 2020 at 5:18 PM Carl Boettiger
>> <cboettig@gmail.com <mailto:cboettig@gmail.com>> wrote:
>>
>> Hi Alasdair,
>>
>> Thanks for the update and your work on this. In
>> the spirit of demonstrating adoption, I think it
>> would be great if the recommendation reflected
>> greater alignment with existing namespaces that
>> are widely used in taxonomy, such as Darwin Core,
>> https://dwc.tdwg.org/terms/#taxon .
>>
>> I think this would greatly facilitate adoption.
>> For instance, the current specification provides
>> no mechanism to disambiguate synonyms
>> (https://dwc.tdwg.org/terms/#dwc:taxonomicStatus,
>> https://dwc.tdwg.org/terms/#dwc:acceptedNameUsageID)
>> or taxonomic concepts. I'm also unclear on the
>> utility of `childTaxon` and `hasDefinedTerm` in
>> the current bioschemas spec. Apologies if I've
>> missed the boat on these discussions already, but
>> these are certainly barriers to me in using
>> bioschemas over an existing namespace like Darwin
>> Core. (Also cc'ing Rob Guralnick on this who has
>> far more expertise than I in this area and could
>> speak more broadly to the potential for adoption
>> of
>> https://bioschemas.org/types/Taxon/0.3-RELEASE-2019_11_18/)
>>
>>
>> Cheers,
>>
>> Carl
>>
>>
>>
>> ---
>> Carl Boettiger
>> http://carlboettiger.info/
>>
>>
>> On Wed, Feb 12, 2020 at 4:04 AM Gray, Alasdair J
>> G <A.J.G.Gray@hw.ac.uk
>> <mailto:A.J.G..Gray@hw.ac.uk>> wrote:
>>
>> Hi Franck,
>>
>> Sorry for the slowness of my response, I have
>> been off work for most of January and am now
>> catching up with things.
>>
>> The status of getting things added to
>> Schema.org <http://Schema.org> is that we
>> need to demonstrate usage of the deployed
>> markup rather than just deployments of it.
>> This is the focus of the latest ELIXIR
>> sponsored project which will be aiming to
>> demonstrate benefit of the markup within
>> specific areas: rare disease, plants,
>> intrinsically disordered proteins, and
>> toxicology. This work will be running over
>> the next 23 months.
>>
>> As such, we should not delay work on other
>> types. So yes, we should progress the work on
>> Taxon and TaxonName.
>>
>> The restructuring of the website that we
>> conducted at the tail end of last year was
>> motivated by making it clearer as to which
>> profiles and types are released for general
>> use and which are still under development.
>>
>> Best regards
>>
>> Alasdair
>>
>>> On 11 Feb 2020, at 17:04, LJ.Garcia
>>> <lj.garcia.co@gmail.com
>>> <mailto:lj.garcia.co@gmail.com>> wrote:
>>>
>>> Hi,
>>>
>>> I am away this week so please allow me some
>>> extra days to have a look to this.
>>>
>>> Kind regards,
>>>
>>> On Saturday, February 8, 2020, Franck Michel
>>> <franck.michel@cnrs.fr
>>> <mailto:franck.michel@cnrs.fr>> wrote:
>>>
>>> Dear Alasdair and Leyla,
>>>
>>> I was wondering if you had time to check
>>> my last reply in issue 309
>>> <https://github.com/BioSchemas/specifications/issues/309#issuecomment-576247584>.
>>> I was suggesting that, if endorsing of
>>> the Taxon term by schema.org
>>> <http://schema.org/> is still gonna take
>>> some time, what about trying to move
>>> directly to the new couple (Taxon,
>>> TaxonName) that we have discussed since
>>> mid-2019.
>>>
>>> Any thoughts on this?
>>>
>>> Thx,
>>> Franck.
>>>
>>> --
>>>
>>> Franck MICHEL - CNRS research engineer
>>> Université Côte d’Azur, CNRS, Inria
>>> I3S laboratory (UMR 7271)
>>> franck.michel@cnrs.fr
>>> <mailto:franck.michel@cnrs.fr> - +33
>>> (0)4 8915 4277
>>>
>>
Received on Monday, 17 February 2020 10:28:41 UTC