W3C home > Mailing lists > Public > public-bioschemas@w3.org > June 2018

Re: Bioschemas.org to define biodiversity-related markup

From: Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
Date: Fri, 15 Jun 2018 12:10:26 +0000
To: Ricardo Arcila <arcila@ebi.ac.uk>
CC: Franck Michel <franck.michel@cnrs.fr>, LJ Garcia Castro <ljgarcia@ebi.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
Message-ID: <8623B334-13E9-4DA8-A15E-F7A489D89AA8@hw.ac.uk>
Hi All

I’m happy for the taxon group to be created with Franck as the initial group lead. Is there someone willing to support Franck in this role?


On 15 Jun 2018, at 12:58, Ricardo Arcila <arcila@ebi.ac.uk<mailto:arcila@ebi.ac.uk>> wrote:

Hello Franck,

I have taken the liberty to create a branch<https://github.com/BioSchemas/bioschemas.github.io/tree/ric/feat/taxons-group> with the draft of the group Taxons, please feel free to adjust it as you see fit.

Kind regards,

On 12 Jun 2018, at 10:02, Franck Michel <franck.michel@cnrs.fr<mailto:franck.michel@cnrs.fr>> wrote:

Dear Ricardo and Leyla,

I just made a pull request, and I created a Biodiversity specification folder on Google drive. Let me know if anything is not right. I've set myself as the group leader, but I would feel more comfortable if someone of the community would join me in this role. And obviously, you are most welcome to join the group!

> will be Taxon a BioChemEntity? I am asking because in UniProt we have proteins link to what is defined as an "unknown" taxon in NCBI taxonomy/UniProt taxonmy. I guess, even if iwe have this "unknown" case, we could still use BiochemEntity and suppose any "unknow" will be eventually resolve to an actual entity. Happy to chat about it.
I agree, the large definition of BioChemEntity makes it appropriate as the root of Taxon. So far, I think of Taxon as a profile more than a type of its own. I'll read the wiki and start drafting something. I let you know if (most probably when) I have any question. ;)


Le 11/06/2018 à 15:46, LJ Garcia Castro a écrit :

Hello Franck,

The taxon profile has been mentioned as one we need before but there was no group for it. Wonderful you are starting one now! Please ask whenever you have a doubt about the process or the different approaches (third-party vocabs or additionalProperty) to deal with properties not covered by BioChemEntity.

By the way, will be Taxon a BioChemEntity? I am asking because in UniProt we have proteins link to what is defined as an "unknown" taxon in NCBI taxonomy/UniProt taxonmy. I guess, even if iwe have this "unknown" case, we could still use BiochemEntity and suppose any "unknow" will be eventually resolve to an actual entity. Happy to chat about it.


On 11/06/2018 14:39, Ricardo Arcila wrote:
Hello Franck,

It is a good idea to start by creating the group. You can do it by creating a pull request on the bioschemas groups repository<https://github.com/BioSchemas/bioschemas.github.io/tree/master/_groups>. Then you can add yourself on the people repository<https://github.com/BioSchemas/bioschemas.github.io/tree/master/_people>. I will be happy to help you in this process and if you'd like I could be part of the group as well.

In order to start a draft specification for Taxon you should create a folder with the profile name on the specifications drive folder<https://drive.google.com/drive/folders/0Bw_p-HKWUjHoNThZOWNKbGhOODg?usp=sharing>. This process its detailed on the bioschemas github wiki<https://github.com/BioSchemas/specifications/wiki/Bioschemas-Specification-Process>.

Please let me know if you have any question or doubt about the process, I will be most happy to help.

Best regards,
Ricardo Arcila

On Thu, Jun 7, 2018 at 9:54 AM Franck Michel <franck.michel@cnrs.fr<mailto:franck.michel@cnrs.fr>> wrote:
Hi all,

I'm catching up with the discussions on the list, and I'm happy to see that things are moving on with the submission of new types to schema.org<http://schema.org/>.

At the same time, I realize that we did not really go ahead about the biodiversity topic. As I will present a poster about Bioschemas.org<http://bioschemas.org/> at the Biodiversity Information Standard in August, that would maybe be a good thing to initiate the work on this by this date. How do we go on? I suggested the creation of a a Taxon profile, but we may have to start with the creation of a group?
Could you please guide me/us in this process?


Le 23/01/2018 à 11:09, Leyla Garcia a écrit :
Hello Bioschemas governance team,

What do you think about going ahead with the Biodiversity schemas? Do we have a heads up?

@Franck, I am not really aware of those organizations but I am happy to guide you through the work we have done for Bioschemas so far. I worked a bit on a biodiversity project but that was some years ago. Still, I like the subject!

Let's wait to see what Carole, Rafael and Alasdair suggest.


On 23/01/2018 08:47, Franck Michel wrote:
Dear Leyla and all,

I understand that your response stands for a GO. Right?

I've not been involved yet in the specification of the Bioschemas.org<http://bioschemas.org/> profiles. So indeed, I shall need help and guidance as to how things are going on, the tools, the process, the expected outcomes, etc.

As I proposed, we could start with contacting people that would potentially be interested in taking part into this. I'm thinking about Encyclopedia of Life, Catalogue of Life, GBIF. If you already know contacts in these organizations, that would certainly be helpful.


Le 22/01/2018 à 11:37, Leyla Garcia a écrit :
Hi Franck,

Great news!

Do you need any help/guides for the start-up?


On 17/01/2018 15:24, Franck Michel wrote:
Dear all,

I'm following up on this suggestion about creating a biodiversity-related group in Bioschemas.org<http://bioschemas.org/>.

The proposition received four +1's. I'm not sure if there is a "minimum score" to attest of sufficient consensus.

As we discussed, if we go for the creation of this group, it would be beneficial to involve at least EoL folks, possibly other people from the biodiversity community. I can try to initiate this, yet before I would like to have an official GO from our community.

Let me know how this usually works, and what you think about this.


Le 17/11/2017 à 16:40, Franck Michel a écrit :
Hi Mélanie, hi all,

To go a bit further I've tried to somewhat extend the example I've initiated. There it is: https://github.com/frmichel/taxref-ld/tree/master/bioschemas-org
The README gives details as to how the example file is organized, and more importantly it lists some of the issues and questions that we shall have to tackle if we officially start the group.

@Alasdair, Carole, Rafael: as discussed in the thread, at some point it shall be beneficial to to invite people from EoL and TDWG. Is there some sort of "official" channel for the community to do that?

Have a nice week-end,

Le 17/11/2017 à 10:19, Melanie Courtot a écrit :
Hi Frank, all,

On 16/11/2017 09:37, Franck Michel wrote:
Hi Meanie, hi all,

EoL provides an API that returns species descriptions as JSON-LD based on schemas.org<http://schemas.org/>. Beluga example: http://eol.org/api/traits/328541

It is unclear who consumes this data, but at least, as you already saw, they embed it at the end of their own web pages such as http://eol.org/pages/328541/data.

BioSamples does the same - an API to retrieve JSON and we embed it in our webpages for crawler as well.

As you also noticed, the JSON-LD they provide is not valid. I didn't know about that EOL Github issue, but I recently discussed it with Rod Page from the Biodiversity Information Standards (aka TDWG), who replied on the Github issue. The Google structured data testing tool gives more details on that: https://frama.link/xJm0AAto
Besides, other errors are not reported (well, I think these are errors): property scienfiticName without any namespace is invalid, that should be dwc:scientificName since this does not exist in schema.org<http://schema.org/>. Same issue for vernacularName, traits, units...

But whatever, this JSON-LD has lots of issues, but it's a start.

Yes. Only mentioned the tweaks in case someone wanted to give it a try as well.

The assumption is that there is some sort of specific (one-to-one) agreement between EoL and Google, and that Google harvests this data despite the invalid JSON-LD. But I have no confirmation of that

It'd be interesting to clarify this. It seems a little bit counter intuitive that EoL would mark their pages up with JSON for Google to read it but then Google couldn't do so without a special adapter? We're probably missing a piece of the story.

> - the measurement type points to http://purl.obolibrary.org/obo/VT_0001256, which is body length. The schema.org/predicate<http://schema.org/predicate> value is also "body length (VT)". How is this understood and displayed as Length on the Google result?
- Similar question for the actual value and units, which are "4249.83" and "mm" respectively. Is Google doing some sort of unit conversion/roundup for display?

Good question. Typically about the unit "mm":
- "units": "mm" => there is no such thing as http://schema.org/units
- "dwc:measurementUnit": "http://purl.obolibrary.org/obo/UO_0000016"<http://purl.obolibrary.org/obo/UO_0000016> => this seems to be the only reliable property, but then Google knows the Darwin Core vocabulary and interprets it.
My assumption is that Google performs some treatment on the values. Possibly, they developed a specific connector to cope with EoL JSON-LD and translate this body size to "4.2 m".
Besides, the snippet mentions "4.2 m (Adult)", so they also presumably consider this property:
    eol:traitUri "http://eol.org/resources/704/measurements/adultheadbodylen27"<http://eol.org/resources/704/measurements/adultheadbodylen27>
to know that this is the size of an adult.

With proper Bioschemas.org<http://bioschemas.org/> profiles, I think we could annotate pages from many other institutions, such as the Beluga page<https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french National Museum of Natural History, and in turn, enable search engines to harvest data from complimentary pages and produce mashups of related pages, etc.
That sounds like a great idea and entirely within the scope of Bioschemas.

At this point, I think we should involve people from EoL, and from the TDWG community (Rod Page would certainly be of great added value in this respect). What do you think? Is there a procedure for inviting people "officially"?
I think we could benefit from their experience indeed; it seems they were able to deploy markup, add additional properties and then get this to be interpreted by Google which seems to match our use case pretty well!
I +1'd the issue at https://github.com/BioSchemas/specifications/issues/115



Le 15/11/2017 à 17:57, Melanie Courtot a écrit :
Hi Frank,

This looks really interesting, thanks for bringing it up. I was trying to find out how the interaction between EoL and schema.org<http://schema.org/> was working and am wondering if you (or someone else!) could shed some light on this?

As you suggested in the below, I checked the google beluga<https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.> search result and do see the line "Length: 4.2 m (Adult) Encyclopedia of Life"

If I try to find where that info comes from, and head to EoL, I can reach the page http://eol.org/pages/328541/overview, and follow the "see all traits" link to http://eol.org/pages/328541/data which contains the JSON-LD.

I trimmed it down to extract the relevant bit, updated the id to be a string as per https://github.com/EOL/tramea/issues/352, and pasted it in the JSON playground mostly to make sure it was working as expected: http://tinyurl.com/yadam6nj

I am missing the link of how the following happens:
- the measurement type points to http://purl.obolibrary.org/obo/VT_0001256, which is body length. The schema.org/predicate<http://schema.org/predicate> value is also "body length (VT)". How is this understood and displayed as Length on the Google result?
- Similar question for the actual value and units, which are "4249.83" and "mm" respectively. Is Google doing some sort of unit conversion/roundup for display?
- Trophic level on EoL is "carnivore", but Google displays "Carnivorous"

Or am I looking at the wrong source for the markup?


On 10/11/2017 15:17, Franck Michel wrote:
Dear all,

I've just joined the Bioschemas.org<http://bioschemas.org/> community following some discussions I had with Alasdair Gray whom I met at ISWC in Vienna, and I'd like to start a new discussion thread.

So, just to start, a few words about me. I'm a CNRS research engineer, I work at the I3S laboratory in France, in particular with the Wimmics research team led by Fabien Gandon. I'm currently involved in some activities related to the publication of taxonomic information as Linked Data [1]. In this context, I've met the Biodiversity Information Standards community (TDWG) that is increasingly considering SW standards, LD publication and web pages markup. This is a domain where, I think, it would be relevant for Bioschemas.org<http://bioschemas.org/> to get involved.

There exist lots of web portals reporting observations, traits and other data about all sorts of living organisms. Encyclopedia of Life<http://eol.org/> (EoL) and the Global Biodiversity Information Facility<https://www.gbif.org/> (GBIF) are some of the most well known. Markup questions are actively considered in this field, for instance EoL web pages embed schemas.org<http://schemas.org/>-based JSON-LD descriptions that Google leverages to enrich their snippets: e.g. if you google beluga<https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.> you shall see 'Encyclopedia of Life' mentions in the snippet providing average weight and size data. For now, this seems to be an "individual" initiative between EoL and Google/schemas.org<http://schemas.org/>, but it would make sense if this was part of a broader reflection led by Bioschemas.org<http://bioschemas.org/>.

My opinion is that fostering the use of common markup by these portals could be very effective in helping the biodiversity community to discover information and figure out new data integration scenarios. Within Bioschemas.org<http://bioschemas.org/>, we could define profiles to account for biodiversity-related information. Taxonomic registers are used as the backbone of many web portals, apps and databases related to biodiversity, agronomy and agriculture. For instance, EoL and GBIF both rely on the Catalog of Life<http://www.catalogueoflife.org/> taxonomy. Therefore, we could start with the definition of a profile to describe a taxon and the related scientific and vernacular names thereof. Then, this could be extended with the representation of traits (characteristics of biological organisms), observations, occurrence data, conservation status (e.g. endangered) etc. There already exist vocabularies for such data such as the well-adopted Darwin Core terms.

As a quick example, consider the web page describing the common dolphin on the web site of the french Museum of Natural History: https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This page could come with a JSON-LD desciption looking like this: https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json
This example is naive and very succinct, and there are lots of things to discuss and decide. Besides, I've just registered on the mailing yesterday, so it may not fit with good practices that you guys have already agreed upon. Sorry if this is the case. Nevertheless, my point is basically to bootstrap the discussion and see if the community is willing to endorse this initiative. If this is the case, we should probably involve people from the biodiversity community: Darwin Core experts, EoL/GBIF representatives etc. But that will come in time.

I look forward to further discussions.

[1] Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. (2017). A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. Application to the French Taxonomic Register, TAXREF. In Proceedings of the 2nd International Workshop on Semantics for Biodiversity (S4BioDiv) co-located with ISWC 2017 vol. 1933. Vienna, Austria. CEUR.


CNRS research engineer
        +33 (0)492 96 5004

Université Côte d’Azur, CNRS, Inria - I3S - UMR 7271
930 route des Colles - Bât. Les Templiers
BP 145 - 06903 Sophia Antipolis CEDEX - France
Tel. +33 (0)4 9294 2680<tel:+33%204%2092%2094%2026%2080>, Fax : +33 (0)4 9294 2898


CNRS research engineer
        +33 (0)4 8915 4277

Université Côte d’Azur, CNRS - I3S - UMR 7271
930 route des Colles<https://maps.google.com/?q=930+route+des+Colles&entry=gmail&source=g> - Bât. Les Templiers
BP 145 - 06903 Sophia Antipolis CEDEX - France
Tel. +33 (0)4 9294 2680<tel:+33%204%2092%2094%2026%2080>

Alasdair J G Gray

Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/0000-0002-5711-4872
Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair


Heriot-Watt University is The Times & The Sunday Times International University of the Year 2018

Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences.

This email is generated from the Heriot-Watt University Group, which includes:

  1.  Heriot-Watt University, a Scottish charity registered under number SC000278
  2.  Edinburgh Business School a Charity Registered in Scotland, SC026900. Edinburgh Business School is a company limited by guarantee, registered in Scotland with registered number SC173556 and registered office at Heriot-Watt University Finance Office, Riccarton, Currie, Midlothian, EH14 4AS
  3.  Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.
Received on Friday, 15 June 2018 12:10:56 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:05 UTC