- From: Leyla Garcia <ljgarcia@ebi.ac.uk>
- Date: Tue, 23 Jan 2018 10:09:06 +0000
- To: Franck Michel <franck.michel@cnrs.fr>, public-bioschemas@w3.org, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, "Rafael C. Jimenez" <rafael.jimenez@elixir-europe.org>, "Carole Goble (carole.goble@manchester.ac.uk)" <carole.goble@manchester.ac.uk>
- Message-ID: <a65e4d45-7fae-5ed2-811e-928dfa4ef243@ebi.ac.uk>
Hello Bioschemas governance team, What do you think about going ahead with the Biodiversity schemas? Do we have a heads up? @Franck, I am not really aware of those organizations but I am happy to guide you through the work we have done for Bioschemas so far. I worked a bit on a biodiversity project but that was some years ago. Still, I like the subject! Let's wait to see what Carole, Rafael and Alasdair suggest. Regards, On 23/01/2018 08:47, Franck Michel wrote: > Dear Leyla and all, > > I understand that your response stands for a GO. Right? > > I've not been involved yet in the specification of the Bioschemas.org > profiles. So indeed, I shall need help and guidance as to how things > are going on, the tools, the process, the expected outcomes, etc. > > As I proposed, we could start with contacting people that would > potentially be interested in taking part into this. I'm thinking about > Encyclopedia of Life, Catalogue of Life, GBIF. If you already know > contacts in these organizations, that would certainly be helpful. > > Franck. > > Le 22/01/2018 à 11:37, Leyla Garcia a écrit : >> Hi Franck, >> >> Great news! >> >> Do you need any help/guides for the start-up? >> >> Cheers, >> >> >> On 17/01/2018 15:24, Franck Michel wrote: >>> Dear all, >>> >>> I'm following up on this suggestion about creating a >>> biodiversity-related group in Bioschemas.org. >>> >>> The proposition received four +1's. I'm not sure if there is a >>> "minimum score" to attest of sufficient consensus. >>> >>> As we discussed, if we go for the creation of this group, it would >>> be beneficial to involve at least EoL folks, possibly other people >>> from the biodiversity community. I can try to initiate this, yet >>> before I would like to have an official GO from our community. >>> >>> Let me know how this usually works, and what you think about this. >>> >>> Regards, >>> Franck. >>> >>> Le 17/11/2017 à 16:40, Franck Michel a écrit : >>>> Hi Mélanie, hi all, >>>> >>>> To go a bit further I've tried to somewhat extend the example I've >>>> initiated. There it is: >>>> https://github.com/frmichel/taxref-ld/tree/master/bioschemas-org >>>> The README gives details as to how the example file is organized, >>>> and more importantly it lists some of the issues and questions that >>>> we shall have to tackle if we officially start the group. >>>> >>>> @Alasdair, Carole, Rafael: as discussed in the thread, at some >>>> point it shall be beneficial to to invite people from EoL and TDWG. >>>> Is there some sort of "official" channel for the community to do that? >>>> >>>> Have a nice week-end, >>>> Franck. >>>> >>>> Le 17/11/2017 à 10:19, Melanie Courtot a écrit : >>>>> Hi Frank, all, >>>>> >>>>> On 16/11/2017 09:37, Franck Michel wrote: >>>>>> Hi Meanie, hi all, >>>>>> >>>>>> EoL provides an API that returns species descriptions as JSON-LD >>>>>> based on schemas.org. Beluga example: >>>>>> http://eol.org/api/traits/328541 >>>>>> It is unclear who consumes this data, but at least, as you >>>>>> already saw, they embed it at the end of their own web pages such >>>>>> as http://eol.org/pages/328541/data. >>>>> BioSamples does the same - an API to retrieve JSON and we embed it >>>>> in our webpages for crawler as well. >>>>>> >>>>>> As you also noticed, the JSON-LD they provide is not valid. I >>>>>> didn't know about that EOL Github issue, but I recently discussed >>>>>> it with Rod Page from the Biodiversity Information Standards (aka >>>>>> TDWG), who replied on the Github issue. The Google structured >>>>>> data testing tool gives more details on that: >>>>>> https://frama.link/xJm0AAto >>>>>> Besides, other errors are not reported (well, I think these are >>>>>> errors): property scienfiticName without any namespace is >>>>>> invalid, that should be dwc:scientificName since this does not >>>>>> exist in schema.org. Same issue for vernacularName, traits, units... >>>>>> >>>>>> But whatever, this JSON-LD has lots of issues, but it's a start. >>>>> >>>>> Yes. Only mentioned the tweaks in case someone wanted to give it a >>>>> try as well. >>>>> >>>>>> The assumption is that there is some sort of specific >>>>>> (one-to-one) agreement between EoL and Google, and that Google >>>>>> harvests this data despite the invalid JSON-LD. But I have no >>>>>> confirmation of that >>>>> >>>>> It'd be interesting to clarify this. It seems a little bit counter >>>>> intuitive that EoL would mark their pages up with JSON for Google >>>>> to read it but then Google couldn't do so without a special >>>>> adapter? We're probably missing a piece of the story. >>>>>> >>>>>> > - the measurement type points to >>>>>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. >>>>>> The schema.org/predicate value is also "body length (VT)". How is >>>>>> this understood and displayed as Length on the Google result? >>>>>> - Similar question for the actual value and units, which are >>>>>> "4249.83" and "mm" respectively. Is Google doing some sort of >>>>>> unit conversion/roundup for display? >>>>>> >>>>>> Good question. Typically about the unit "mm": >>>>>> - "units": "mm" => there is no such thing as http://schema.org/units >>>>>> - "dwc:measurementUnit": >>>>>> "http://purl.obolibrary.org/obo/UO_0000016" => this seems to be >>>>>> the only reliable property, but then Google knows the Darwin Core >>>>>> vocabulary and interprets it. >>>>>> My assumption is that Google performs some treatment on the >>>>>> values. Possibly, they developed a specific connector to cope >>>>>> with EoL JSON-LD and translate this body size to "4.2 m". >>>>>> Besides, the snippet mentions "4.2 m *(Adult)*", so they also >>>>>> presumably consider this property: >>>>>> eol:traitUri"http://eol.org/resources/704/measurements/adultheadbodylen27" >>>>>> to know that this is the size of an adult. >>>>>> >>>>>> With proper Bioschemas.org profiles, I think we could annotate >>>>>> pages from many other institutions, such as the Beluga page >>>>>> <https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french >>>>>> National Museum of Natural History, and in turn, enable search >>>>>> engines to harvest data from complimentary pages and produce >>>>>> mashups of related pages, etc. >>>>> That sounds like a great idea and entirely within the scope of >>>>> Bioschemas. >>>>>> >>>>>> At this point, I think we should involve people from EoL, and >>>>>> from the TDWG community (Rod Page would certainly be of great >>>>>> added value in this respect). What do you think? Is there a >>>>>> procedure for inviting people "officially"? >>>>> I think we could benefit from their experience indeed; it seems >>>>> they were able to deploy markup, add additional properties and >>>>> then get this to be interpreted by Google which seems to match our >>>>> use case pretty well! >>>>> I +1'd the issue at >>>>> https://github.com/BioSchemas/specifications/issues/115 >>>>> >>>>> Cheers, >>>>> Melanie >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Franck. >>>>>> >>>>>> >>>>>> Le 15/11/2017 à 17:57, Melanie Courtot a écrit : >>>>>>> Hi Frank, >>>>>>> >>>>>>> This looks really interesting, thanks for bringing it up. I was >>>>>>> trying to find out how the interaction between EoL and >>>>>>> schema.org was working and am wondering if you (or someone >>>>>>> else!) could shed some light on this? >>>>>>> >>>>>>> As you suggested in the below, I checked the google beluga >>>>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> >>>>>>> search result and do see the line "Length: 4.2 m (Adult) >>>>>>> Encyclopedia of Life" >>>>>>> >>>>>>> If I try to find where that info comes from, and head to EoL, I >>>>>>> can reach the page http://eol.org/pages/328541/overview, and >>>>>>> follow the "see all traits" link to >>>>>>> http://eol.org/pages/328541/data which contains the JSON-LD. >>>>>>> >>>>>>> I trimmed it down to extract the relevant bit, updated the id to >>>>>>> be a string as per https://github.com/EOL/tramea/issues/352, and >>>>>>> pasted it in the JSON playground mostly to make sure it was >>>>>>> working as expected: http://tinyurl.com/yadam6nj >>>>>>> >>>>>>> I am missing the link of how the following happens: >>>>>>> - the measurement type points to >>>>>>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. >>>>>>> The schema.org/predicate value is also "body length (VT)". How >>>>>>> is this understood and displayed as Length on the Google result? >>>>>>> - Similar question for the actual value and units, which are >>>>>>> "4249.83" and "mm" respectively. Is Google doing some sort of >>>>>>> unit conversion/roundup for display? >>>>>>> - Trophic level on EoL is "carnivore", but Google displays >>>>>>> "Carnivorous" >>>>>>> etc >>>>>>> >>>>>>> Or am I looking at the wrong source for the markup? >>>>>>> >>>>>>> Cheers, >>>>>>> Melanie >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 10/11/2017 15:17, Franck Michel wrote: >>>>>>>> Dear all, >>>>>>>> >>>>>>>> I've just joined the Bioschemas.org community following some >>>>>>>> discussions I had with Alasdair Gray whom I met at ISWC in >>>>>>>> Vienna, and I'd like to start a new discussion thread. >>>>>>>> >>>>>>>> So, just to start, a few words about me. I'm a CNRS research >>>>>>>> engineer, I work at the I3S laboratory in France, in particular >>>>>>>> with the Wimmics research team led by Fabien Gandon. I'm >>>>>>>> currently involved in some activities related to the >>>>>>>> publication of taxonomic information as Linked Data [1]. In >>>>>>>> this context, I've met the Biodiversity Information Standards >>>>>>>> community (TDWG) that is increasingly considering SW standards, >>>>>>>> LD publication and web pages markup. This is a domain where, I >>>>>>>> think, it would be relevant for Bioschemas.orgto get involved. >>>>>>>> >>>>>>>> There exist lots of web portals reporting observations, traits >>>>>>>> and other data about all sorts of living organisms. >>>>>>>> Encyclopedia of Life <http://eol.org/> (EoL) and the Global >>>>>>>> Biodiversity Information Facility <https://www.gbif.org/> >>>>>>>> (GBIF) are some of the most well known. Markup questions are >>>>>>>> actively considered in this field, for instance EoL web pages >>>>>>>> embed schemas.org-based JSON-LD descriptions that Google >>>>>>>> leverages to enrich their snippets: e.g. if you google beluga >>>>>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> >>>>>>>> you shall see 'Encyclopedia of Life' mentions in the snippet >>>>>>>> providing average weight and size data. For now, this seems to >>>>>>>> be an "individual" initiative between EoL and >>>>>>>> Google/schemas.org, but it would make sense if this was part of >>>>>>>> a broader reflection led by Bioschemas.org. >>>>>>>> >>>>>>>> My opinion is that fostering the use of common markup by these >>>>>>>> portals could be very effective in helping the biodiversity >>>>>>>> community to discover information and figure out new data >>>>>>>> integration scenarios.Within Bioschemas.org, we could define >>>>>>>> profiles to account for biodiversity-related >>>>>>>> information.Taxonomic registers are used as the backbone of >>>>>>>> many web portals, apps and databases related to biodiversity, >>>>>>>> agronomy and agriculture.For instance, EoL and GBIF both rely >>>>>>>> on the Catalog of Life <http://www.catalogueoflife.org/> >>>>>>>> taxonomy. Therefore, we could start with the definition of a >>>>>>>> profile to describe a taxon and the related scientific and >>>>>>>> vernacular names thereof. Then, this could be extended with the >>>>>>>> representation of traits (characteristics of biological >>>>>>>> organisms), observations, occurrence data, conservation status >>>>>>>> (e.g. endangered) etc. There already exist vocabularies for >>>>>>>> such data such as the well-adopted Darwin Core terms. >>>>>>>> >>>>>>>> As a quick example, consider the web page describing the common >>>>>>>> dolphin on the web site of the french Museum of Natural >>>>>>>> History: https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This >>>>>>>> page could come with a JSON-LD desciption looking like this: >>>>>>>> https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json >>>>>>>> This example is naive and very succinct, and there are lots of >>>>>>>> things to discuss and decide. Besides, I've just registered on >>>>>>>> the mailing yesterday, so it may not fit with good practices >>>>>>>> that you guys have already agreed upon. Sorry if this is the >>>>>>>> case. Nevertheless, my point is basically to bootstrap the >>>>>>>> discussion and see if the community is willing to endorse this >>>>>>>> initiative. If this is the case, we should probably involve >>>>>>>> people from the biodiversity community: Darwin Core experts, >>>>>>>> EoL/GBIF representatives etc. But that will come in time. >>>>>>>> >>>>>>>> I look forward to further discussions. >>>>>>>> Regards, >>>>>>>> Franck. >>>>>>>> >>>>>>>> [1] Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. >>>>>>>> (2017). A Model to Represent Nomenclatural and Taxonomic >>>>>>>> Information as Linked Data. Application to the French Taxonomic >>>>>>>> Register, TAXREF. In Proceedings of the 2nd International >>>>>>>> Workshop on Semantics for Biodiversity (S4BioDiv) co-located >>>>>>>> with ISWC 2017 vol. 1933. Vienna, Austria. CEUR. >>>>>>>> >>>>>>>> -- >>>>>>>> signature >>>>>>>> >>>>>>>> Franck MICHEL >>>>>>>> CNRS research engineer >>>>>>>> +33 (0)492 96 5004 >>>>>>>> franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Université Côte d’Azur, CNRS, *Inria* - I3S - UMR 7271 >>>>>>>> 930 route des Colles - Bât. Les Templiers >>>>>>>> BP 145 - 06903 Sophia Antipolis CEDEX - France >>>>>>>> Tel. +33 (0)4 9294 2680, Fax : +33 (0)4 9294 2898 >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
Received on Tuesday, 23 January 2018 10:09:37 UTC