W3C home > Mailing lists > Public > public-bioschemas@w3.org > June 2018

Re: Bioschemas.org to define biodiversity-related markup

From: Franck Michel <franck.michel@cnrs.fr>
Date: Thu, 7 Jun 2018 10:52:43 +0200
To: Leyla Garcia <ljgarcia@ebi.ac.uk>, public-bioschemas@w3.org, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, "Rafael C. Jimenez" <rafael.jimenez@elixir-europe.org>, "Carole Goble (carole.goble@manchester.ac.uk)" <carole.goble@manchester.ac.uk>
Message-ID: <1e94f033-bac1-3052-56e2-8803174558fa@cnrs.fr>
Hi all,

I'm catching up with the discussions on the list, and I'm happy to see 
that things are moving on with the submission of new types to schema.org.

At the same time, I realize that we did not really go ahead about the 
biodiversity topic. As I will present a poster about Bioschemas.org at 
the Biodiversity Information Standard in August, that would maybe be a 
good thing to initiate the work on this by this date. How do we go on? I 
suggested the creation of a a Taxon profile, but we may have to start 
with the creation of a group?
Could you please guide me/us in this process?

Thx,
     Franck.

Le 23/01/2018 à 11:09, Leyla Garcia a écrit :
> Hello Bioschemas governance team,
>
> What do you think about going ahead with the Biodiversity schemas? Do 
> we have a heads up?
>
> @Franck, I am not really aware of those organizations but I am happy 
> to guide you through the work we have done for Bioschemas so far. I 
> worked a bit on a biodiversity project but that was some years ago. 
> Still, I like the subject!
>
> Let's wait to see what Carole, Rafael and Alasdair suggest.
>
> Regards,
>
> On 23/01/2018 08:47, Franck Michel wrote:
>> Dear Leyla and all,
>>
>> I understand that your response stands for a GO. Right?
>>
>> I've not been involved yet in the specification of the Bioschemas.org 
>> profiles. So indeed, I shall need help and guidance as to how things 
>> are going on, the tools, the process, the expected outcomes, etc.
>>
>> As I proposed, we could start with contacting people that would 
>> potentially be interested in taking part into this. I'm thinking 
>> about Encyclopedia of Life, Catalogue of Life, GBIF. If you already 
>> know contacts in these organizations, that would certainly be helpful.
>>
>> Franck.
>>
>> Le 22/01/2018 à 11:37, Leyla Garcia a écrit :
>>> Hi Franck,
>>>
>>> Great news!
>>>
>>> Do you need any help/guides for the start-up?
>>>
>>> Cheers,
>>>
>>>
>>> On 17/01/2018 15:24, Franck Michel wrote:
>>>> Dear all,
>>>>
>>>> I'm following up on this suggestion about creating a 
>>>> biodiversity-related group in Bioschemas.org.
>>>>
>>>> The proposition received four +1's. I'm not sure if there is a 
>>>> "minimum score" to attest of sufficient consensus.
>>>>
>>>> As we discussed, if we go for the creation of this group, it would 
>>>> be beneficial to involve at least EoL folks, possibly other people 
>>>> from the biodiversity community. I can try to initiate this, yet 
>>>> before I would like to have an official GO from our community.
>>>>
>>>> Let me know how this usually works, and what you think about this.
>>>>
>>>> Regards,
>>>>     Franck.
>>>>
>>>> Le 17/11/2017 à 16:40, Franck Michel a écrit :
>>>>> Hi Mélanie, hi all,
>>>>>
>>>>> To go a bit further I've tried to somewhat extend the example I've 
>>>>> initiated. There it is: 
>>>>> https://github.com/frmichel/taxref-ld/tree/master/bioschemas-org
>>>>> The README gives details as to how the example file is organized, 
>>>>> and more importantly it lists some of the issues and questions 
>>>>> that we shall have to tackle if we officially start the group.
>>>>>
>>>>> @Alasdair, Carole, Rafael: as discussed in the thread, at some 
>>>>> point it shall be beneficial to to invite people from EoL and 
>>>>> TDWG. Is there some sort of "official" channel for the community 
>>>>> to do that?
>>>>>
>>>>> Have a nice week-end,
>>>>>     Franck.
>>>>>
>>>>> Le 17/11/2017 à 10:19, Melanie Courtot a écrit :
>>>>>> Hi Frank, all,
>>>>>>
>>>>>> On 16/11/2017 09:37, Franck Michel wrote:
>>>>>>> Hi Meanie, hi all,
>>>>>>>
>>>>>>> EoL provides an API that returns species descriptions as JSON-LD 
>>>>>>> based on schemas.org. Beluga example: 
>>>>>>> http://eol.org/api/traits/328541
>>>>>>> It is unclear who consumes this data, but at least, as you 
>>>>>>> already saw, they embed it at the end of their own web pages 
>>>>>>> such as http://eol.org/pages/328541/data.
>>>>>> BioSamples does the same - an API to retrieve JSON and we embed 
>>>>>> it in our webpages for crawler as well.
>>>>>>>
>>>>>>> As you also noticed, the JSON-LD they provide is not valid. I 
>>>>>>> didn't know about that EOL Github issue, but I recently 
>>>>>>> discussed it with Rod Page from the Biodiversity Information 
>>>>>>> Standards (aka TDWG), who replied on the Github issue. The 
>>>>>>> Google structured data testing tool gives more details on that: 
>>>>>>> https://frama.link/xJm0AAto
>>>>>>> Besides, other errors are not reported (well, I think these are 
>>>>>>> errors): property scienfiticName without any namespace is 
>>>>>>> invalid, that should be dwc:scientificName since this does not 
>>>>>>> exist in schema.org. Same issue for vernacularName, traits, units...
>>>>>>>
>>>>>>> But whatever, this JSON-LD has lots of issues, but it's a start. 
>>>>>>
>>>>>> Yes. Only mentioned the tweaks in case someone wanted to give it 
>>>>>> a try as well.
>>>>>>
>>>>>>> The assumption is that there is some sort of specific 
>>>>>>> (one-to-one) agreement between EoL and Google, and that Google 
>>>>>>> harvests this data despite the invalid JSON-LD. But I have no 
>>>>>>> confirmation of that
>>>>>>
>>>>>> It'd be interesting to clarify this. It seems a little bit 
>>>>>> counter intuitive that EoL would mark their pages up with JSON 
>>>>>> for Google to read it but then Google couldn't do so without a 
>>>>>> special adapter? We're probably missing a piece of the story.
>>>>>>>
>>>>>>> > - the measurement type points to 
>>>>>>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. 
>>>>>>> The schema.org/predicate value is also "body length (VT)". How 
>>>>>>> is this understood and displayed as Length on the Google result?
>>>>>>> - Similar question for the actual value and units, which are 
>>>>>>> "4249.83" and "mm" respectively. Is Google doing some sort of 
>>>>>>> unit conversion/roundup for display?
>>>>>>>
>>>>>>> Good question. Typically about the unit "mm":
>>>>>>> - "units": "mm" => there is no such thing as http://schema.org/units
>>>>>>> - "dwc:measurementUnit": 
>>>>>>> "http://purl.obolibrary.org/obo/UO_0000016" => this seems to be 
>>>>>>> the only reliable property, but then Google knows the Darwin 
>>>>>>> Core vocabulary and interprets it.
>>>>>>> My assumption is that Google performs some treatment on the 
>>>>>>> values. Possibly, they developed a specific connector to cope 
>>>>>>> with EoL JSON-LD and translate this body size to "4.2 m".
>>>>>>> Besides, the snippet mentions "4.2 m *(Adult)*", so they also 
>>>>>>> presumably consider this property:
>>>>>>> eol:traitUri"http://eol.org/resources/704/measurements/adultheadbodylen27"
>>>>>>> to know that this is the size of an adult.
>>>>>>>
>>>>>>> With proper Bioschemas.org profiles, I think we could annotate 
>>>>>>> pages from many other institutions, such as the Beluga page 
>>>>>>> <https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french 
>>>>>>> National Museum of Natural History, and in turn, enable search 
>>>>>>> engines to harvest data from complimentary pages and produce 
>>>>>>> mashups of related pages, etc.
>>>>>> That sounds like a great idea and entirely within the scope of 
>>>>>> Bioschemas.
>>>>>>>
>>>>>>> At this point, I think we should involve people from EoL, and 
>>>>>>> from the TDWG community (Rod Page would certainly be of great 
>>>>>>> added value in this respect). What do you think? Is there a 
>>>>>>> procedure for inviting people "officially"?
>>>>>> I think we could benefit from their experience indeed; it seems 
>>>>>> they were able to deploy markup, add additional properties and 
>>>>>> then get this to be interpreted by Google which seems to match 
>>>>>> our use case pretty well!
>>>>>> I +1'd the issue at 
>>>>>> https://github.com/BioSchemas/specifications/issues/115
>>>>>>
>>>>>> Cheers,
>>>>>> Melanie
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Franck.
>>>>>>>
>>>>>>>
>>>>>>> Le 15/11/2017 à 17:57, Melanie Courtot a écrit :
>>>>>>>> Hi Frank,
>>>>>>>>
>>>>>>>> This looks really interesting, thanks for bringing it up. I was 
>>>>>>>> trying to find out how the interaction between EoL and 
>>>>>>>> schema.org was working and am wondering if you (or someone 
>>>>>>>> else!) could shed some light on this?
>>>>>>>>
>>>>>>>> As you suggested in the below, I checked the google beluga 
>>>>>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
>>>>>>>> search result and do see the line "Length: 4.2 m (Adult) 
>>>>>>>> Encyclopedia of Life"
>>>>>>>>
>>>>>>>> If I try to find where that info comes from, and head to EoL, I 
>>>>>>>> can reach the page http://eol.org/pages/328541/overview, and 
>>>>>>>> follow the "see all traits" link to 
>>>>>>>> http://eol.org/pages/328541/data which contains the JSON-LD.
>>>>>>>>
>>>>>>>> I trimmed it down to extract the relevant bit, updated the id 
>>>>>>>> to be a string as per https://github.com/EOL/tramea/issues/352, 
>>>>>>>> and pasted it in the JSON playground mostly to make sure it was 
>>>>>>>> working as expected: http://tinyurl.com/yadam6nj
>>>>>>>>
>>>>>>>> I am missing the link of how the following happens:
>>>>>>>> - the measurement type points to 
>>>>>>>> http://purl.obolibrary.org/obo/VT_0001256, which is body 
>>>>>>>> length. The schema.org/predicate value is also "body length 
>>>>>>>> (VT)". How is this understood and displayed as Length on the 
>>>>>>>> Google result?
>>>>>>>> - Similar question for the actual value and units, which are 
>>>>>>>> "4249.83" and "mm" respectively. Is Google doing some sort of 
>>>>>>>> unit conversion/roundup for display?
>>>>>>>> - Trophic level on EoL is "carnivore", but Google displays 
>>>>>>>> "Carnivorous"
>>>>>>>> etc
>>>>>>>>
>>>>>>>> Or am I looking at the wrong source for the markup?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Melanie
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/11/2017 15:17, Franck Michel wrote:
>>>>>>>>> Dear all,
>>>>>>>>>
>>>>>>>>> I've just joined the Bioschemas.org community following some 
>>>>>>>>> discussions I had with Alasdair Gray whom I met at ISWC in 
>>>>>>>>> Vienna, and I'd like to start a new discussion thread.
>>>>>>>>>
>>>>>>>>> So, just to start, a few words about me. I'm a CNRS research 
>>>>>>>>> engineer, I work at the I3S laboratory in France, in 
>>>>>>>>> particular with the Wimmics research team led by Fabien 
>>>>>>>>> Gandon. I'm currently involved in some activities related to 
>>>>>>>>> the publication of taxonomic information as Linked Data [1]. 
>>>>>>>>> In this context, I've met the Biodiversity Information 
>>>>>>>>> Standards community (TDWG) that is increasingly considering SW 
>>>>>>>>> standards, LD publication and web pages markup. This is a 
>>>>>>>>> domain where, I think, it would be relevant for 
>>>>>>>>> Bioschemas.orgto get involved.
>>>>>>>>>
>>>>>>>>> There exist lots of web portals reporting observations, traits 
>>>>>>>>> and other data about all sorts of living organisms. 
>>>>>>>>> Encyclopedia of Life <http://eol.org/> (EoL) and the Global 
>>>>>>>>> Biodiversity Information Facility <https://www.gbif.org/> 
>>>>>>>>> (GBIF) are some of the most well known. Markup questions are 
>>>>>>>>> actively considered in this field, for instance EoL web pages 
>>>>>>>>> embed schemas.org-based JSON-LD descriptions that Google 
>>>>>>>>> leverages to enrich their snippets: e.g. if you google beluga 
>>>>>>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
>>>>>>>>> you shall see 'Encyclopedia of Life' mentions in the snippet 
>>>>>>>>> providing average weight and size data. For now, this seems to 
>>>>>>>>> be an "individual" initiative between EoL and 
>>>>>>>>> Google/schemas.org, but it would make sense if this was part 
>>>>>>>>> of a broader reflection led by Bioschemas.org.
>>>>>>>>>
>>>>>>>>> My opinion is that fostering the use of common markup by these 
>>>>>>>>> portals could be very effective in helping the biodiversity 
>>>>>>>>> community to discover information and figure out new data 
>>>>>>>>> integration scenarios.Within Bioschemas.org, we could define 
>>>>>>>>> profiles to account for biodiversity-related 
>>>>>>>>> information.Taxonomic registers are used as the backbone of 
>>>>>>>>> many web portals, apps and databases related to biodiversity, 
>>>>>>>>> agronomy and agriculture.For instance, EoL and GBIF both rely 
>>>>>>>>> on the Catalog of Life <http://www.catalogueoflife.org/> 
>>>>>>>>> taxonomy. Therefore, we could start with the definition of a 
>>>>>>>>> profile to describe a taxon and the related scientific and 
>>>>>>>>> vernacular names thereof. Then, this could be extended with 
>>>>>>>>> the representation of traits (characteristics of biological 
>>>>>>>>> organisms), observations, occurrence data, conservation status 
>>>>>>>>> (e.g. endangered) etc. There already exist vocabularies for 
>>>>>>>>> such data such as the well-adopted Darwin Core terms.
>>>>>>>>>
>>>>>>>>> As a quick example, consider the web page describing the 
>>>>>>>>> common dolphin on the web site of the french Museum of Natural 
>>>>>>>>> History: https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This 
>>>>>>>>> page could come with a JSON-LD desciption looking like this: 
>>>>>>>>> https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json
>>>>>>>>> This example is naive and very succinct, and there are lots of 
>>>>>>>>> things to discuss and decide. Besides, I've just registered on 
>>>>>>>>> the mailing yesterday, so it may not fit with good practices 
>>>>>>>>> that you guys have already agreed upon. Sorry if this is the 
>>>>>>>>> case. Nevertheless, my point is basically to bootstrap the 
>>>>>>>>> discussion and see if the community is willing to endorse this 
>>>>>>>>> initiative. If this is the case, we should probably involve 
>>>>>>>>> people from the biodiversity community: Darwin Core experts, 
>>>>>>>>> EoL/GBIF representatives etc. But that will come in time.
>>>>>>>>>
>>>>>>>>> I look forward to further discussions.
>>>>>>>>> Regards,
>>>>>>>>>    Franck.
>>>>>>>>>
>>>>>>>>> [1] Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. 
>>>>>>>>> (2017). A Model to Represent Nomenclatural and Taxonomic 
>>>>>>>>> Information as Linked Data. Application to the French 
>>>>>>>>> Taxonomic Register, TAXREF. In Proceedings of the 2nd 
>>>>>>>>> International Workshop on Semantics for Biodiversity 
>>>>>>>>> (S4BioDiv) co-located with ISWC 2017 vol. 1933. Vienna, 
>>>>>>>>> Austria. CEUR.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> signature
>>>>>>>>> 	
>>>>>>>>> Franck MICHEL
>>>>>>>>> CNRS research engineer
>>>>>>>>> 	+33 (0)492 96 5004
>>>>>>>>> franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr>
>>>>>>>>>
>>>>>>>>> 	
>>>>>>>>>
>>>>>>>>> Université Côte d’Azur, CNRS, *Inria* - I3S - UMR 7271
>>>>>>>>> 930 route des Colles - Bât. Les Templiers
>>>>>>>>> BP 145 - 06903 Sophia Antipolis CEDEX - France
>>>>>>>>> Tel. +33 (0)4 9294 2680, Fax : +33 (0)4 9294 2898
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
signature
	
Franck MICHEL
CNRS research engineer
	+33 (0)4 8915 4277
franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr>

	

Université Côte d’Azur, CNRS- I3S - UMR 7271
930 route des Colles - Bât. Les Templiers
BP 145 - 06903 Sophia Antipolis CEDEX - France
Tel. +33 (0)4 9294 2680
Received on Thursday, 7 June 2018 08:53:16 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 7 June 2018 08:53:17 UTC