Re: Bioschemas.org to define biodiversity-related markup

Hello Bioschemas governance team,

What do you think about going ahead with the Biodiversity schemas? Do we 
have a heads up?

@Franck, I am not really aware of those organizations but I am happy to 
guide you through the work we have done for Bioschemas so far. I worked 
a bit on a biodiversity project but that was some years ago. Still, I 
like the subject!

Let's wait to see what Carole, Rafael and Alasdair suggest.

Regards,

On 23/01/2018 08:47, Franck Michel wrote:
> Dear Leyla and all,
>
> I understand that your response stands for a GO. Right?
>
> I've not been involved yet in the specification of the Bioschemas.org 
> profiles. So indeed, I shall need help and guidance as to how things 
> are going on, the tools, the process, the expected outcomes, etc.
>
> As I proposed, we could start with contacting people that would 
> potentially be interested in taking part into this. I'm thinking about 
> Encyclopedia of Life, Catalogue of Life, GBIF. If you already know 
> contacts in these organizations, that would certainly be helpful.
>
> Franck.
>
> Le 22/01/2018 à 11:37, Leyla Garcia a écrit :
>> Hi Franck,
>>
>> Great news!
>>
>> Do you need any help/guides for the start-up?
>>
>> Cheers,
>>
>>
>> On 17/01/2018 15:24, Franck Michel wrote:
>>> Dear all,
>>>
>>> I'm following up on this suggestion about creating a 
>>> biodiversity-related group in Bioschemas.org.
>>>
>>> The proposition received four +1's. I'm not sure if there is a 
>>> "minimum score" to attest of sufficient consensus.
>>>
>>> As we discussed, if we go for the creation of this group, it would 
>>> be beneficial to involve at least EoL folks, possibly other people 
>>> from the biodiversity community. I can try to initiate this, yet 
>>> before I would like to have an official GO from our community.
>>>
>>> Let me know how this usually works, and what you think about this.
>>>
>>> Regards,
>>>     Franck.
>>>
>>> Le 17/11/2017 à 16:40, Franck Michel a écrit :
>>>> Hi Mélanie, hi all,
>>>>
>>>> To go a bit further I've tried to somewhat extend the example I've 
>>>> initiated. There it is: 
>>>> https://github.com/frmichel/taxref-ld/tree/master/bioschemas-org
>>>> The README gives details as to how the example file is organized, 
>>>> and more importantly it lists some of the issues and questions that 
>>>> we shall have to tackle if we officially start the group.
>>>>
>>>> @Alasdair, Carole, Rafael: as discussed in the thread, at some 
>>>> point it shall be beneficial to to invite people from EoL and TDWG. 
>>>> Is there some sort of "official" channel for the community to do that?
>>>>
>>>> Have a nice week-end,
>>>>     Franck.
>>>>
>>>> Le 17/11/2017 à 10:19, Melanie Courtot a écrit :
>>>>> Hi Frank, all,
>>>>>
>>>>> On 16/11/2017 09:37, Franck Michel wrote:
>>>>>> Hi Meanie, hi all,
>>>>>>
>>>>>> EoL provides an API that returns species descriptions as JSON-LD 
>>>>>> based on schemas.org. Beluga example: 
>>>>>> http://eol.org/api/traits/328541
>>>>>> It is unclear who consumes this data, but at least, as you 
>>>>>> already saw, they embed it at the end of their own web pages such 
>>>>>> as http://eol.org/pages/328541/data.
>>>>> BioSamples does the same - an API to retrieve JSON and we embed it 
>>>>> in our webpages for crawler as well.
>>>>>>
>>>>>> As you also noticed, the JSON-LD they provide is not valid. I 
>>>>>> didn't know about that EOL Github issue, but I recently discussed 
>>>>>> it with Rod Page from the Biodiversity Information Standards (aka 
>>>>>> TDWG), who replied on the Github issue. The Google structured 
>>>>>> data testing tool gives more details on that: 
>>>>>> https://frama.link/xJm0AAto
>>>>>> Besides, other errors are not reported (well, I think these are 
>>>>>> errors): property scienfiticName without any namespace is 
>>>>>> invalid, that should be dwc:scientificName since this does not 
>>>>>> exist in schema.org. Same issue for vernacularName, traits, units...
>>>>>>
>>>>>> But whatever, this JSON-LD has lots of issues, but it's a start. 
>>>>>
>>>>> Yes. Only mentioned the tweaks in case someone wanted to give it a 
>>>>> try as well.
>>>>>
>>>>>> The assumption is that there is some sort of specific 
>>>>>> (one-to-one) agreement between EoL and Google, and that Google 
>>>>>> harvests this data despite the invalid JSON-LD. But I have no 
>>>>>> confirmation of that
>>>>>
>>>>> It'd be interesting to clarify this. It seems a little bit counter 
>>>>> intuitive that EoL would mark their pages up with JSON for Google 
>>>>> to read it but then Google couldn't do so without a special 
>>>>> adapter? We're probably missing a piece of the story.
>>>>>>
>>>>>> > - the measurement type points to 
>>>>>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. 
>>>>>> The schema.org/predicate value is also "body length (VT)". How is 
>>>>>> this understood and displayed as Length on the Google result?
>>>>>> - Similar question for the actual value and units, which are 
>>>>>> "4249.83" and "mm" respectively. Is Google doing some sort of 
>>>>>> unit conversion/roundup for display?
>>>>>>
>>>>>> Good question. Typically about the unit "mm":
>>>>>> - "units": "mm" => there is no such thing as http://schema.org/units
>>>>>> - "dwc:measurementUnit": 
>>>>>> "http://purl.obolibrary.org/obo/UO_0000016" => this seems to be 
>>>>>> the only reliable property, but then Google knows the Darwin Core 
>>>>>> vocabulary and interprets it.
>>>>>> My assumption is that Google performs some treatment on the 
>>>>>> values. Possibly, they developed a specific connector to cope 
>>>>>> with EoL JSON-LD and translate this body size to "4.2 m".
>>>>>> Besides, the snippet mentions "4.2 m *(Adult)*", so they also 
>>>>>> presumably consider this property:
>>>>>> eol:traitUri"http://eol.org/resources/704/measurements/adultheadbodylen27"
>>>>>> to know that this is the size of an adult.
>>>>>>
>>>>>> With proper Bioschemas.org profiles, I think we could annotate 
>>>>>> pages from many other institutions, such as the Beluga page 
>>>>>> <https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french 
>>>>>> National Museum of Natural History, and in turn, enable search 
>>>>>> engines to harvest data from complimentary pages and produce 
>>>>>> mashups of related pages, etc.
>>>>> That sounds like a great idea and entirely within the scope of 
>>>>> Bioschemas.
>>>>>>
>>>>>> At this point, I think we should involve people from EoL, and 
>>>>>> from the TDWG community (Rod Page would certainly be of great 
>>>>>> added value in this respect). What do you think? Is there a 
>>>>>> procedure for inviting people "officially"?
>>>>> I think we could benefit from their experience indeed; it seems 
>>>>> they were able to deploy markup, add additional properties and 
>>>>> then get this to be interpreted by Google which seems to match our 
>>>>> use case pretty well!
>>>>> I +1'd the issue at 
>>>>> https://github.com/BioSchemas/specifications/issues/115
>>>>>
>>>>> Cheers,
>>>>> Melanie
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Franck.
>>>>>>
>>>>>>
>>>>>> Le 15/11/2017 à 17:57, Melanie Courtot a écrit :
>>>>>>> Hi Frank,
>>>>>>>
>>>>>>> This looks really interesting, thanks for bringing it up. I was 
>>>>>>> trying to find out how the interaction between EoL and 
>>>>>>> schema.org was working and am wondering if you (or someone 
>>>>>>> else!) could shed some light on this?
>>>>>>>
>>>>>>> As you suggested in the below, I checked the google beluga 
>>>>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
>>>>>>> search result and do see the line "Length: 4.2 m (Adult) 
>>>>>>> Encyclopedia of Life"
>>>>>>>
>>>>>>> If I try to find where that info comes from, and head to EoL, I 
>>>>>>> can reach the page http://eol.org/pages/328541/overview, and 
>>>>>>> follow the "see all traits" link to 
>>>>>>> http://eol.org/pages/328541/data which contains the JSON-LD.
>>>>>>>
>>>>>>> I trimmed it down to extract the relevant bit, updated the id to 
>>>>>>> be a string as per https://github.com/EOL/tramea/issues/352, and 
>>>>>>> pasted it in the JSON playground mostly to make sure it was 
>>>>>>> working as expected: http://tinyurl.com/yadam6nj
>>>>>>>
>>>>>>> I am missing the link of how the following happens:
>>>>>>> - the measurement type points to 
>>>>>>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. 
>>>>>>> The schema.org/predicate value is also "body length (VT)". How 
>>>>>>> is this understood and displayed as Length on the Google result?
>>>>>>> - Similar question for the actual value and units, which are 
>>>>>>> "4249.83" and "mm" respectively. Is Google doing some sort of 
>>>>>>> unit conversion/roundup for display?
>>>>>>> - Trophic level on EoL is "carnivore", but Google displays 
>>>>>>> "Carnivorous"
>>>>>>> etc
>>>>>>>
>>>>>>> Or am I looking at the wrong source for the markup?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Melanie
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/11/2017 15:17, Franck Michel wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> I've just joined the Bioschemas.org community following some 
>>>>>>>> discussions I had with Alasdair Gray whom I met at ISWC in 
>>>>>>>> Vienna, and I'd like to start a new discussion thread.
>>>>>>>>
>>>>>>>> So, just to start, a few words about me. I'm a CNRS research 
>>>>>>>> engineer, I work at the I3S laboratory in France, in particular 
>>>>>>>> with the Wimmics research team led by Fabien Gandon. I'm 
>>>>>>>> currently involved in some activities related to the 
>>>>>>>> publication of taxonomic information as Linked Data [1]. In 
>>>>>>>> this context, I've met the Biodiversity Information Standards 
>>>>>>>> community (TDWG) that is increasingly considering SW standards, 
>>>>>>>> LD publication and web pages markup. This is a domain where, I 
>>>>>>>> think, it would be relevant for Bioschemas.orgto get involved.
>>>>>>>>
>>>>>>>> There exist lots of web portals reporting observations, traits 
>>>>>>>> and other data about all sorts of living organisms. 
>>>>>>>> Encyclopedia of Life <http://eol.org/> (EoL) and the Global 
>>>>>>>> Biodiversity Information Facility <https://www.gbif.org/> 
>>>>>>>> (GBIF) are some of the most well known. Markup questions are 
>>>>>>>> actively considered in this field, for instance EoL web pages 
>>>>>>>> embed schemas.org-based JSON-LD descriptions that Google 
>>>>>>>> leverages to enrich their snippets: e.g. if you google beluga 
>>>>>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
>>>>>>>> you shall see 'Encyclopedia of Life' mentions in the snippet 
>>>>>>>> providing average weight and size data. For now, this seems to 
>>>>>>>> be an "individual" initiative between EoL and 
>>>>>>>> Google/schemas.org, but it would make sense if this was part of 
>>>>>>>> a broader reflection led by Bioschemas.org.
>>>>>>>>
>>>>>>>> My opinion is that fostering the use of common markup by these 
>>>>>>>> portals could be very effective in helping the biodiversity 
>>>>>>>> community to discover information and figure out new data 
>>>>>>>> integration scenarios.Within Bioschemas.org, we could define 
>>>>>>>> profiles to account for biodiversity-related 
>>>>>>>> information.Taxonomic registers are used as the backbone of 
>>>>>>>> many web portals, apps and databases related to biodiversity, 
>>>>>>>> agronomy and agriculture.For instance, EoL and GBIF both rely 
>>>>>>>> on the Catalog of Life <http://www.catalogueoflife.org/> 
>>>>>>>> taxonomy. Therefore, we could start with the definition of a 
>>>>>>>> profile to describe a taxon and the related scientific and 
>>>>>>>> vernacular names thereof. Then, this could be extended with the 
>>>>>>>> representation of traits (characteristics of biological 
>>>>>>>> organisms), observations, occurrence data, conservation status 
>>>>>>>> (e.g. endangered) etc. There already exist vocabularies for 
>>>>>>>> such data such as the well-adopted Darwin Core terms.
>>>>>>>>
>>>>>>>> As a quick example, consider the web page describing the common 
>>>>>>>> dolphin on the web site of the french Museum of Natural 
>>>>>>>> History: https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This 
>>>>>>>> page could come with a JSON-LD desciption looking like this: 
>>>>>>>> https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json
>>>>>>>> This example is naive and very succinct, and there are lots of 
>>>>>>>> things to discuss and decide. Besides, I've just registered on 
>>>>>>>> the mailing yesterday, so it may not fit with good practices 
>>>>>>>> that you guys have already agreed upon. Sorry if this is the 
>>>>>>>> case. Nevertheless, my point is basically to bootstrap the 
>>>>>>>> discussion and see if the community is willing to endorse this 
>>>>>>>> initiative. If this is the case, we should probably involve 
>>>>>>>> people from the biodiversity community: Darwin Core experts, 
>>>>>>>> EoL/GBIF representatives etc. But that will come in time.
>>>>>>>>
>>>>>>>> I look forward to further discussions.
>>>>>>>> Regards,
>>>>>>>>    Franck.
>>>>>>>>
>>>>>>>> [1] Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. 
>>>>>>>> (2017). A Model to Represent Nomenclatural and Taxonomic 
>>>>>>>> Information as Linked Data. Application to the French Taxonomic 
>>>>>>>> Register, TAXREF. In Proceedings of the 2nd International 
>>>>>>>> Workshop on Semantics for Biodiversity (S4BioDiv) co-located 
>>>>>>>> with ISWC 2017 vol. 1933. Vienna, Austria. CEUR.
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> signature
>>>>>>>>  
>>>>>>>> Franck MICHEL
>>>>>>>> CNRS research engineer
>>>>>>>>  +33 (0)492 96 5004
>>>>>>>> franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr>
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Université Côte d’Azur, CNRS, *Inria* - I3S - UMR 7271
>>>>>>>> 930 route des Colles - Bât. Les Templiers
>>>>>>>> BP 145 - 06903 Sophia Antipolis CEDEX - France
>>>>>>>> Tel. +33 (0)4 9294 2680, Fax : +33 (0)4 9294 2898
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Received on Tuesday, 23 January 2018 10:09:37 UTC