- From: Franck Michel <franck.michel@cnrs.fr>
- Date: Mon, 17 Feb 2020 11:27:21 +0100
- To: Matt Yoder <diapriid@gmail.com>, Quentin Groom <quentin.groom@plantentuinmeise.be>
- Cc: Carl Boettiger <cboettig@gmail.com>, "LJ.Garcia" <lj.garcia.co@gmail.com>, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>, Robert Guralnick <robgur@gmail.com>
- Message-ID: <199e1fd1-f689-513f-b13d-3244b9cab661@cnrs.fr>
Dear all, To keep track of the full discussion on Taxon vs. TaxonName, I took the liberty of copying the last exchanges to issue https://github.com/BioSchemas/specifications/issues/309. *Please continue interacting on this Github issue rather than by replying to this email.* Franck. Le 15/02/2020 à 16:37, Matt Yoder a écrit : > > Hi all, > > Just diving into this discussion so my apologies if I'm rehashing > things that have been worked out (I'm certain I am), please ignore if so. > > What I see from the outset are needs that conflict, sometimes > significantly. These fall into two categories as Quentin and others > noted: 1) compatibility, i.e. things need to work with concepts that > have existed and been implemented and 2) clarification, i.e.the > ability to use terms consistently, and therefor comparably in a > meaningful way. I suggest that anything that emerges from this effort > be (strongly) biased to 2, even at the partial or significant cost to > 1. I fear that terms that support the confusion between name and > concept (which isn't that difficult if you step back) are going to > keep our efforts blurred, and interoperability unresolved. I'm seeing > precisely this happen in large ongoing efforts that I won't name. > Users of terms, importantly (but far from exclusively) the technical > teams that implement databases, tools etc. need to work to stop > blurring the lines, to get there is going to be a long slow > educational process, but decisions by parties like this one can help > get us there. > > I know of no system that yet currently handles the semantics perfectly > (this may be impossible), but I do know several ideas are > emerging/have emerged: > > 1) If your data model does not distinguish names from concepts, your > system is going to whir OK for a while, then see serious problems that > frustrate everybody, internal and external. These can be problems as > simple as trying to keep track of what software code in your system > does what (in fact this is our prime reason for keeping the two > separate in our group's efforts). > 2) There is "synonym" and there is nomenclatural synonymy.. Trying to > dance between the two is going to cause problems as in 1).. We've > created NOMEN (https://github.com/SpeciesFileGroup/nomen) to let us > isolate and handle the later. It is *OK* for only taxonomists to know > about nomenclatural synonymy and its nuances, not everybody has to > know everything. We've buried the complexities of using NOMEN in > interfaces that taxonomists understand. > 3) Systems that require nomenclature before concepts can be > instantiated are going to fail. For example, users need to capture > data about undescribed taxa, and not everyone wants/needs to > understand nomenclature. > 4) Using new terms, even if foreign, can help people begin to > understand the distinction between names and concepts. We use "Otu" > for taxon concept and "TaxonName" for taxon name.. This term has > historical baggage, but curators/scientists get our new use with very > little explanation. Do not fear injecting new terms into the world!!! > 5) Practically, when describing the difference between TaxonName and > Otu we ask people to run through a little test: > - Is your data about the biology (in the broadest sense) or > distribution (etc) of an organism? Then it should be linked first to > an Otu. > - Is your data about the name of the organism, specifically as it > pertains to the application of code of biological nomenclature? Then > your data is linked/added first to a TaxonName. Note that this data > is always objective, i.e. the intent is to capture assertions that > have been cited in the literature regardless of their biological > interpretation. > This distinction has immediate consequences, i.e. where in the > application, or data to start to look to make changes, or retrieve > information. > 6) There needs to be edges between Otus, and edges between TaxonNames > and Otus and edges between TaxonNames and TaxonNames (these defined in > NOMEN in our case). If you have a table with both "otu_id and > taxon_name_id" in it you're going to have a certain set of things you > can't do (I know, we do), yet this is the simple way to get started > that most people take.. There are at *least* 5 core relationships > between/within the use of TaxonNames and Otus, and numerous > relationships that are "subclasses" of these types. Conceptually we've > started to tease these out as we want to implement them in our > software, see TaxonConceptRelationships.pdf (download to zoom) here > https://github.com/SpeciesFileGroup/taxonworks_doc/tree/master/concepts. > This is obviously work in progress. > > Just my 2c as well! > > Cheers, > Matt > > > > > On Sat, Feb 15, 2020 at 4:15 AM Quentin Groom > <quentin.groom@plantentuinmeise.be > <mailto:quentin.groom@plantentuinmeise.be>> wrote: > > Hi Carl, Franck, Alasdair and all, > at least for me, the taxonName term was created to support > findability for taxonomic names registries, such as Zoobank, > Mycobank and IPNI. As these databases do not keep track of taxa > they would be poorly supported by the use of a taxon term in place > of a taxonName term. Having said that, I would avoid modelling > biological taxonomy and nomenclature in bioschemas, because it's > quite a minefield. Therefore, I would keep the relationship > between taxon and taxonName as simple as possible. It should be > simple enough to support finability of resources on the internet, > but it is never going to be rich enough to support an > understanding of the nuances of taxonomic concepts and their > interrelationships with taxonNames. > For me, one would use taxonName when your data relates to the > publication and typification of a name, but use taxon when your > data is primarily about the traits of the taxon and other > biological features. Clearly, there are overlaps. I particularly > see either option being useful for specimens, but again it depends > on the use case. > I'm not sure if this helps the discuss, but that's my 2 cents worth. > Quentin > > > > > On Fri, 14 Feb 2020 at 18:37, Carl Boettiger <cboettig@gmail.com > <mailto:cboettig@gmail.com>> wrote: > > Hi Franck, > > Thanks for the detailed reply and please let me know if we > should move this discussion over to a GitHub Issue? Apologies > I wasn't up to speed on the more recent discussions than what > is on the bioschemas website. > > I'm have reviewed the threads you link and I very much share > the sentiments and objectives you have all voiced there and in > this thread (avoid the debates, leverage existing schema.org > <http://schema.org> vocab whenever possible). Unfortunately, > I'm afraid the new proposals sound quite confusing. It seems > the proposal to create a new `TaxonName` implicitly means that > `Taxon` is supposed to effectively mean "TaxonConcept"? I > agree TaxonConcept is not an area of consensus, and it's main > purpose is to allow for discussion in a world where different > authorities have conflicting/overlapping notions of > TaxonConcept, and I'm really not sure we want to go that route. > > If Taxon is not meant as "the concept of taxon" then I don't > see how it is different from a TaxonName. (This is made even > more confusing by the fact that "name" is also a Property of a > taxon). I think this new proposal is much more confusing > than the original! I acknowledge that the "Concept" of a > Taxon is different than a name, but I think we would be better > off not attempting to define a class/Type for "TaxonConcept" > (since afik the experts haven't done that), and we should let > the proposal of "@type": "schema:Taxon" mean a name, which is > how most people see it. (At it simplest, we should think of > "Taxon" as merely a name/label we apply to an individual > specimen, and not worry about defining the 'class of all such > specimens). > > > > Defining the inverse pair `hasSynonym` & `synonymOf` sounds > reasonable, though I do worry a bit about the complexity. > That is, taxonomically, `hasSynonym` implies it is property of > an "accepted name", while `synonymOf` sounds like a property > of "the synonym", but in English "synonyms" are symmetric, > there's no "accepted" one. I wonder if (paralleling the > darwin core terms) it would be better to use the optional > property "acceptedName" (and not define an inverse property). > > "@type" : "Taxon", > "name" : "Rollandia micropterum", > "@id": > "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=1000254" > "acceptedName": { > "@type": "Taxon", > "name": "Rollandia microptera", > "@id": > "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=562791" > } > > Does that make sense? > > Apologies, not trying to open a can of worms here, just > aspiring to the same goals of avoiding debate and re-using > existing terms! > > --- > Carl Boettiger > http://carlboettiger.info/ > > > On Fri, Feb 14, 2020 at 6:32 AM Franck Michel > <franck.michel@cnrs.fr <mailto:franck.michel@cnrs..fr>> wrote: > > Dear Carl, Leyla (+ Quentin who shall certainly be > interested in this), > > I agree that we should do an effort to better explain how > the current recommendation aligns with existing > vocabularies, specifically Darwin Core. > > I'll try to describe how we can solve that. I'm sorry this > email is pretty long, but I don't know how to be clear and > short at the same time ;) > > There have been quite some discussions in the beginning > wrt. what the Taxon term shall refer to: a taxon concept? > A taxon name usage? etc. Even experts do not always agree > on the definition of those terms. So we agreed on two > principles: > - Bioschemas should not get into experts' debates, but > instead remain at a general level where there is consensus. > - we should create as little new terms as possible, that > is: rely on existing schema.org <http://schema.org> terms > when revelant, and "import" existing terms from other > vocabularies when necessary (this is the Taxon _profile_ > part). > > A taxon (instance of type Taxon) is associated with an > accepted (or valid) name (schema:name), 0 to any number of > synonyms (schema:alternateName), and identifiers from > other DBs: > > "@type" : "Taxon", > "additionalType": [ "dwc:Taxon", > "http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept" > <http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept> ], > "*name*": "Delphinapterus leucas (Pallas, 1776)", > "*alternateName*": [ "Balaena albicans Muller, 1776", > "Beluga catodon Gray, 1846" ], > "identifier": [ > { "@type": "PropertyValue", > "name": "WoRMS id", > "propertyID": > "https://www.wikidata.org/entity/P850" > <https://www.wikidata.org/entity/P850>, > "value": "137115" > } > ] > > In further discussions > <https://github.com/BioSchemas/specifications/issues/309>, > we agreed that modelling only taxa was not sufficient as > some databases/portals describe scientific names, not > taxa. So we started defining the TaxonName term > <https://docs.google.com/spreadsheets/d/1ZZxL6_9VvlDJCXMf_0JnIzyBHExxA6eFIiEDKr6gFqY/edit#gid=1261485211> > (which is not yet published on the web site, but I'm on > it...). This term allows to give more specific information > about a name. > Hence the creation of two new properties > schema:scientificName and schema:alternateScientificName > which are the counterparts of schema:name and > schema:alternateName, but with an object of type TaxonTerm > insead of a string. One would typically use either one > couple of of properties or the other, by they might be > used simultaneously though: > > "*name*": "Delphinapterus leucas (Pallas, 1776)", > "*alternateName*": [ "Balaena albicans Muller, 1776" ] > > "*scientificName*": { > "@type" : "TaxonName", > "name": "Delphinapterus leucas", > "author": "(Pallas, 1776)" > }, > "*alternateScientificName*": [ > { "@type" : "TaxonName", > "name": "Balaena albicans", > "author": "Muller, 1776" > } > ] > > Now, how does this compare with Darwin Core? The pb is > that Darwin Core RDF terms describe names and names > usages, not taxa. In the example you provide: > {"taxonID": "ITIS:1000254", > "scientificName": "Rollandia micropterum", > "acceptedNameUsageID": "ITIS:562791", > "taxonomicStatus": "synonym", > "vernacularName": "Titicaca Grebe" > } > > "ITIS:1000254" actually represents a taxon's name which > happens to be a synonym of "ITIS:562791", therefore the > need for acceptedNameUsageID and taxonomicStatus. > With the Taxon and TaxonName terms, we could write the > same thing by first denoting a Taxon with an accepted name > (scientificName) and a synonym (alternateScientificName), > like this: > > "@type" : "Taxon", > "scientificName": { > "@type" : "TaxonName", > "identifier": { > "@type": "PropertyValue", > "name": "ITIS id", > "value": "562791" > } > }, > "alternateScientificName": [ > { "@type" : "TaxonName", > "name" : "Rollandia micropterum", > "identifier": { > "@type": "PropertyValue", > "name": "ITIS id", > "value": "1000254" > } > } > ] > > Still, this seems a bit cumbersome since you just want to > represent names but you have to denote a Taxon. > So, one option could be to have a new set of properties > *hasSynonym/synonymOf *to only denote relationships > between TaxonName's instances: > > "@type" : "TaxonName", > "name" : "Rollandia micropterum", > "identifier": { > "@type": "PropertyValue", > "name": "ITIS id", > "value": "1000254" > } > "*synonymOf*": { > "@type" : "TaxonName", > "identifier": { > "@type": "PropertyValue", > "name": "ITIS id", > "value": "562791" > } > > What do you think? Would that work for you? > > Franck. > > Le 13/02/2020 à 19:49, Carl Boettiger a écrit : >> Thanks! >> >> Yes, identifiers are of course the solution, the point is >> that you need two different identifiers and you need to >> know which is which. Here's a quick DarwinCore example: >> { >> "taxonID": "ITIS:1000254", >> "scientificName": "Rollandia micropterum", >> "acceptedNameUsageID": "ITIS:562791", >> "taxonomicStatus": "synonym", >> "vernacularName": "Titicaca Grebe" >> } >> >> We don't need `taxonomicStatus` explicitly here, since it >> is implied by seeing that the accepted ID >> (acceptedNameUsageID) is not the same thing as the >> taxonID for this name. But we do need two identifiers, >> and we need to know which one is which. It's not clear >> to me how the above would be represented in the >> schema.org <http://schema.org> proposal. (of course one >> could say "don't use synonyms! but we may as well then >> say "don't use scientific names, just use accepted >> identifiers" but we live in a world that uses scientific >> names so we need these mechanism that can acknowledge >> some names are synonyms) >> >> --- >> Carl Boettiger >> http://carlboettiger.info/ >> >> >> On Thu, Feb 13, 2020 at 9:58 AM LJ.Garcia >> <lj.garcia.co@gmail.com <mailto:lj.garcia.co@gmail.com>> >> wrote: >> >> Hi Carl, Franck, all, >> >> @Carl, Franck is probably the best person to point >> you to discussions/reasons regarding the property >> names. I am not much aware of how synonyms are >> handled in Darwin Core so my question could be naïve >> but... having different identifiers would not help >> there? Identifiers in Bioschemas should be FAIR, so, >> even if the label is the same, the identifier should >> tell you better, would not it? Regarding taxonomic >> concepts, again, Franck is the one that can answer >> better. >> @Franck, if necessary, further properties could be >> included at this point as the submission to >> schema.org <http://schema.org> still will take a bit. >> Also, if not done already, I would suggest to add >> examples per property so people understand better how >> to use them. >> >> Kind regards, >> >> On Wed, Feb 12, 2020 at 5:18 PM Carl Boettiger >> <cboettig@gmail.com <mailto:cboettig@gmail.com>> wrote: >> >> Hi Alasdair, >> >> Thanks for the update and your work on this. In >> the spirit of demonstrating adoption, I think it >> would be great if the recommendation reflected >> greater alignment with existing namespaces that >> are widely used in taxonomy, such as Darwin Core, >> https://dwc.tdwg.org/terms/#taxon . >> >> I think this would greatly facilitate adoption. >> For instance, the current specification provides >> no mechanism to disambiguate synonyms >> (https://dwc.tdwg.org/terms/#dwc:taxonomicStatus, >> https://dwc.tdwg.org/terms/#dwc:acceptedNameUsageID) >> or taxonomic concepts. I'm also unclear on the >> utility of `childTaxon` and `hasDefinedTerm` in >> the current bioschemas spec. Apologies if I've >> missed the boat on these discussions already, but >> these are certainly barriers to me in using >> bioschemas over an existing namespace like Darwin >> Core. (Also cc'ing Rob Guralnick on this who has >> far more expertise than I in this area and could >> speak more broadly to the potential for adoption >> of >> https://bioschemas.org/types/Taxon/0.3-RELEASE-2019_11_18/) >> >> >> Cheers, >> >> Carl >> >> >> >> --- >> Carl Boettiger >> http://carlboettiger.info/ >> >> >> On Wed, Feb 12, 2020 at 4:04 AM Gray, Alasdair J >> G <A.J.G.Gray@hw.ac.uk >> <mailto:A.J.G..Gray@hw.ac.uk>> wrote: >> >> Hi Franck, >> >> Sorry for the slowness of my response, I have >> been off work for most of January and am now >> catching up with things. >> >> The status of getting things added to >> Schema.org <http://Schema.org> is that we >> need to demonstrate usage of the deployed >> markup rather than just deployments of it. >> This is the focus of the latest ELIXIR >> sponsored project which will be aiming to >> demonstrate benefit of the markup within >> specific areas: rare disease, plants, >> intrinsically disordered proteins, and >> toxicology. This work will be running over >> the next 23 months. >> >> As such, we should not delay work on other >> types. So yes, we should progress the work on >> Taxon and TaxonName. >> >> The restructuring of the website that we >> conducted at the tail end of last year was >> motivated by making it clearer as to which >> profiles and types are released for general >> use and which are still under development. >> >> Best regards >> >> Alasdair >> >>> On 11 Feb 2020, at 17:04, LJ.Garcia >>> <lj.garcia.co@gmail.com >>> <mailto:lj.garcia.co@gmail.com>> wrote: >>> >>> Hi, >>> >>> I am away this week so please allow me some >>> extra days to have a look to this. >>> >>> Kind regards, >>> >>> On Saturday, February 8, 2020, Franck Michel >>> <franck.michel@cnrs.fr >>> <mailto:franck.michel@cnrs.fr>> wrote: >>> >>> Dear Alasdair and Leyla, >>> >>> I was wondering if you had time to check >>> my last reply in issue 309 >>> <https://github.com/BioSchemas/specifications/issues/309#issuecomment-576247584>. >>> I was suggesting that, if endorsing of >>> the Taxon term by schema.org >>> <http://schema.org/> is still gonna take >>> some time, what about trying to move >>> directly to the new couple (Taxon, >>> TaxonName) that we have discussed since >>> mid-2019. >>> >>> Any thoughts on this? >>> >>> Thx, >>> Franck. >>> >>> -- >>> >>> Franck MICHEL - CNRS research engineer >>> Université Côte d’Azur, CNRS, Inria >>> I3S laboratory (UMR 7271) >>> franck.michel@cnrs.fr >>> <mailto:franck.michel@cnrs.fr> - +33 >>> (0)4 8915 4277 >>> >>
Received on Monday, 17 February 2020 10:28:41 UTC