Re: Bioschemas.org to define biodiversity-related markup

Hi Franck,

Great news!

Do you need any help/guides for the start-up?

Cheers,


On 17/01/2018 15:24, Franck Michel wrote:
> Dear all,
>
> I'm following up on this suggestion about creating a 
> biodiversity-related group in Bioschemas.org.
>
> The proposition received four +1's. I'm not sure if there is a 
> "minimum score" to attest of sufficient consensus.
>
> As we discussed, if we go for the creation of this group, it would be 
> beneficial to involve at least EoL folks, possibly other people from 
> the biodiversity community. I can try to initiate this, yet before I 
> would like to have an official GO from our community.
>
> Let me know how this usually works, and what you think about this.
>
> Regards,
>     Franck.
>
> Le 17/11/2017 à 16:40, Franck Michel a écrit :
>> Hi Mélanie, hi all,
>>
>> To go a bit further I've tried to somewhat extend the example I've 
>> initiated. There it is: 
>> https://github.com/frmichel/taxref-ld/tree/master/bioschemas-org
>> The README gives details as to how the example file is organized, and 
>> more importantly it lists some of the issues and questions that we 
>> shall have to tackle if we officially start the group.
>>
>> @Alasdair, Carole, Rafael: as discussed in the thread, at some point 
>> it shall be beneficial to to invite people from EoL and TDWG. Is 
>> there some sort of "official" channel for the community to do that?
>>
>> Have a nice week-end,
>>     Franck.
>>
>> Le 17/11/2017 à 10:19, Melanie Courtot a écrit :
>>> Hi Frank, all,
>>>
>>> On 16/11/2017 09:37, Franck Michel wrote:
>>>> Hi Meanie, hi all,
>>>>
>>>> EoL provides an API that returns species descriptions as JSON-LD 
>>>> based on schemas.org. Beluga example: http://eol.org/api/traits/328541
>>>> It is unclear who consumes this data, but at least, as you already 
>>>> saw, they embed it at the end of their own web pages such as 
>>>> http://eol.org/pages/328541/data.
>>> BioSamples does the same - an API to retrieve JSON and we embed it 
>>> in our webpages for crawler as well.
>>>>
>>>> As you also noticed, the JSON-LD they provide is not valid. I 
>>>> didn't know about that EOL Github issue, but I recently discussed 
>>>> it with Rod Page from the Biodiversity Information Standards (aka 
>>>> TDWG), who replied on the Github issue. The Google structured data 
>>>> testing tool gives more details on that: https://frama.link/xJm0AAto
>>>> Besides, other errors are not reported (well, I think these are 
>>>> errors): property scienfiticName without any namespace is invalid, 
>>>> that should be dwc:scientificName since this does not exist in 
>>>> schema.org. Same issue for vernacularName, traits, units...
>>>>
>>>> But whatever, this JSON-LD has lots of issues, but it's a start. 
>>>
>>> Yes. Only mentioned the tweaks in case someone wanted to give it a 
>>> try as well.
>>>
>>>> The assumption is that there is some sort of specific (one-to-one) 
>>>> agreement between EoL and Google, and that Google harvests this 
>>>> data despite the invalid JSON-LD. But I have no confirmation of that
>>>
>>> It'd be interesting to clarify this. It seems a little bit counter 
>>> intuitive that EoL would mark their pages up with JSON for Google to 
>>> read it but then Google couldn't do so without a special adapter? 
>>> We're probably missing a piece of the story.
>>>>
>>>> > - the measurement type points to 
>>>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. 
>>>> The schema.org/predicate value is also "body length (VT)". How is 
>>>> this understood and displayed as Length on the Google result?
>>>> - Similar question for the actual value and units, which are 
>>>> "4249.83" and "mm" respectively. Is Google doing some sort of unit 
>>>> conversion/roundup for display?
>>>>
>>>> Good question. Typically about the unit "mm":
>>>> - "units": "mm" => there is no such thing as http://schema.org/units
>>>> - "dwc:measurementUnit": 
>>>> "http://purl.obolibrary.org/obo/UO_0000016" => this seems to be the 
>>>> only reliable property, but then Google knows the Darwin Core 
>>>> vocabulary and interprets it.
>>>> My assumption is that Google performs some treatment on the values. 
>>>> Possibly, they developed a specific connector to cope with EoL 
>>>> JSON-LD and translate this body size to "4.2 m".
>>>> Besides, the snippet mentions "4.2 m *(Adult)*", so they also 
>>>> presumably consider this property:
>>>> eol:traitUri"http://eol.org/resources/704/measurements/adultheadbodylen27"
>>>> to know that this is the size of an adult.
>>>>
>>>> With proper Bioschemas.org profiles, I think we could annotate 
>>>> pages from many other institutions, such as the Beluga page 
>>>> <https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french 
>>>> National Museum of Natural History, and in turn, enable search 
>>>> engines to harvest data from complimentary pages and produce 
>>>> mashups of related pages, etc.
>>> That sounds like a great idea and entirely within the scope of 
>>> Bioschemas.
>>>>
>>>> At this point, I think we should involve people from EoL, and from 
>>>> the TDWG community (Rod Page would certainly be of great added 
>>>> value in this respect). What do you think? Is there a procedure for 
>>>> inviting people "officially"?
>>> I think we could benefit from their experience indeed; it seems they 
>>> were able to deploy markup, add additional properties and then get 
>>> this to be interpreted by Google which seems to match our use case 
>>> pretty well!
>>> I +1'd the issue at 
>>> https://github.com/BioSchemas/specifications/issues/115
>>>
>>> Cheers,
>>> Melanie
>>>
>>>
>>>
>>>
>>>>
>>>> Franck.
>>>>
>>>>
>>>> Le 15/11/2017 à 17:57, Melanie Courtot a écrit :
>>>>> Hi Frank,
>>>>>
>>>>> This looks really interesting, thanks for bringing it up. I was 
>>>>> trying to find out how the interaction between EoL and schema.org 
>>>>> was working and am wondering if you (or someone else!) could shed 
>>>>> some light on this?
>>>>>
>>>>> As you suggested in the below, I checked the google beluga 
>>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
>>>>> search result and do see the line "Length: 4.2 m (Adult) 
>>>>> Encyclopedia of Life"
>>>>>
>>>>> If I try to find where that info comes from, and head to EoL, I 
>>>>> can reach the page http://eol.org/pages/328541/overview, and 
>>>>> follow the "see all traits" link to 
>>>>> http://eol.org/pages/328541/data which contains the JSON-LD.
>>>>>
>>>>> I trimmed it down to extract the relevant bit, updated the id to 
>>>>> be a string as per https://github.com/EOL/tramea/issues/352, and 
>>>>> pasted it in the JSON playground mostly to make sure it was 
>>>>> working as expected: http://tinyurl.com/yadam6nj
>>>>>
>>>>> I am missing the link of how the following happens:
>>>>> - the measurement type points to 
>>>>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. 
>>>>> The schema.org/predicate value is also "body length (VT)". How is 
>>>>> this understood and displayed as Length on the Google result?
>>>>> - Similar question for the actual value and units, which are 
>>>>> "4249.83" and "mm" respectively. Is Google doing some sort of unit 
>>>>> conversion/roundup for display?
>>>>> - Trophic level on EoL is "carnivore", but Google displays 
>>>>> "Carnivorous"
>>>>> etc
>>>>>
>>>>> Or am I looking at the wrong source for the markup?
>>>>>
>>>>> Cheers,
>>>>> Melanie
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/11/2017 15:17, Franck Michel wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> I've just joined the Bioschemas.org community following some 
>>>>>> discussions I had with Alasdair Gray whom I met at ISWC in 
>>>>>> Vienna, and I'd like to start a new discussion thread.
>>>>>>
>>>>>> So, just to start, a few words about me. I'm a CNRS research 
>>>>>> engineer, I work at the I3S laboratory in France, in particular 
>>>>>> with the Wimmics research team led by Fabien Gandon. I'm 
>>>>>> currently involved in some activities related to the publication 
>>>>>> of taxonomic information as Linked Data [1]. In this context, 
>>>>>> I've met the Biodiversity Information Standards community (TDWG) 
>>>>>> that is increasingly considering SW standards, LD publication and 
>>>>>> web pages markup. This is a domain where, I think, it would be 
>>>>>> relevant for Bioschemas.orgto get involved.
>>>>>>
>>>>>> There exist lots of web portals reporting observations, traits 
>>>>>> and other data about all sorts of living organisms. Encyclopedia 
>>>>>> of Life <http://eol.org/> (EoL) and the Global Biodiversity 
>>>>>> Information Facility <https://www.gbif.org/> (GBIF) are some of 
>>>>>> the most well known. Markup questions are actively considered in 
>>>>>> this field, for instance EoL web pages embed schemas.org-based 
>>>>>> JSON-LD descriptions that Google leverages to enrich their 
>>>>>> snippets: e.g. if you google beluga 
>>>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
>>>>>> you shall see 'Encyclopedia of Life' mentions in the snippet 
>>>>>> providing average weight and size data. For now, this seems to be 
>>>>>> an "individual" initiative between EoL and Google/schemas.org, 
>>>>>> but it would make sense if this was part of a broader reflection 
>>>>>> led by Bioschemas.org.
>>>>>>
>>>>>> My opinion is that fostering the use of common markup by these 
>>>>>> portals could be very effective in helping the biodiversity 
>>>>>> community to discover information and figure out new data 
>>>>>> integration scenarios.Within Bioschemas.org, we could define 
>>>>>> profiles to account for biodiversity-related 
>>>>>> information.Taxonomic registers are used as the backbone of many 
>>>>>> web portals, apps and databases related to biodiversity, agronomy 
>>>>>> and agriculture.For instance, EoL and GBIF both rely on the 
>>>>>> Catalog of Life <http://www.catalogueoflife.org/> taxonomy. 
>>>>>> Therefore, we could start with the definition of a profile to 
>>>>>> describe a taxon and the related scientific and vernacular names 
>>>>>> thereof. Then, this could be extended with the representation of 
>>>>>> traits (characteristics of biological organisms), observations, 
>>>>>> occurrence data, conservation status (e.g. endangered) etc. There 
>>>>>> already exist vocabularies for such data such as the well-adopted 
>>>>>> Darwin Core terms.
>>>>>>
>>>>>> As a quick example, consider the web page describing the common 
>>>>>> dolphin on the web site of the french Museum of Natural History: 
>>>>>> https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This page could 
>>>>>> come with a JSON-LD desciption looking like this: 
>>>>>> https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json
>>>>>> This example is naive and very succinct, and there are lots of 
>>>>>> things to discuss and decide. Besides, I've just registered on 
>>>>>> the mailing yesterday, so it may not fit with good practices that 
>>>>>> you guys have already agreed upon. Sorry if this is the case. 
>>>>>> Nevertheless, my point is basically to bootstrap the discussion 
>>>>>> and see if the community is willing to endorse this initiative. 
>>>>>> If this is the case, we should probably involve people from the 
>>>>>> biodiversity community: Darwin Core experts, EoL/GBIF 
>>>>>> representatives etc. But that will come in time.
>>>>>>
>>>>>> I look forward to further discussions.
>>>>>> Regards,
>>>>>>    Franck.
>>>>>>
>>>>>> [1] Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. 
>>>>>> (2017). A Model to Represent Nomenclatural and Taxonomic 
>>>>>> Information as Linked Data. Application to the French Taxonomic 
>>>>>> Register, TAXREF. In Proceedings of the 2nd International 
>>>>>> Workshop on Semantics for Biodiversity (S4BioDiv) co-located with 
>>>>>> ISWC 2017 vol. 1933. Vienna, Austria. CEUR.
>>>>>>
>>>>>> -- 
>>>>>> signature
>>>>>>  
>>>>>> Franck MICHEL
>>>>>> CNRS research engineer
>>>>>>  +33 (0)492 96 5004
>>>>>> franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr>
>>>>>>
>>>>>>  
>>>>>>
>>>>>> Université Côte d’Azur, CNRS, *Inria* - I3S - UMR 7271
>>>>>> 930 route des Colles - Bât. Les Templiers
>>>>>> BP 145 - 06903 Sophia Antipolis CEDEX - France
>>>>>> Tel. +33 (0)4 9294 2680, Fax : +33 (0)4 9294 2898
>>>>>>
>>>>>
>>>>
>>>
>>
>

Received on Monday, 22 January 2018 10:37:37 UTC