- From: Leyla Garcia <ljgarcia@ebi.ac.uk>
- Date: Mon, 22 Jan 2018 10:37:10 +0000
- To: Franck Michel <franck.michel@cnrs.fr>, public-bioschemas@w3.org
- Message-ID: <2c3728ee-7550-7356-4606-52265c8b5060@ebi.ac.uk>
Hi Franck, Great news! Do you need any help/guides for the start-up? Cheers, On 17/01/2018 15:24, Franck Michel wrote: > Dear all, > > I'm following up on this suggestion about creating a > biodiversity-related group in Bioschemas.org. > > The proposition received four +1's. I'm not sure if there is a > "minimum score" to attest of sufficient consensus. > > As we discussed, if we go for the creation of this group, it would be > beneficial to involve at least EoL folks, possibly other people from > the biodiversity community. I can try to initiate this, yet before I > would like to have an official GO from our community. > > Let me know how this usually works, and what you think about this. > > Regards, > Franck. > > Le 17/11/2017 à 16:40, Franck Michel a écrit : >> Hi Mélanie, hi all, >> >> To go a bit further I've tried to somewhat extend the example I've >> initiated. There it is: >> https://github.com/frmichel/taxref-ld/tree/master/bioschemas-org >> The README gives details as to how the example file is organized, and >> more importantly it lists some of the issues and questions that we >> shall have to tackle if we officially start the group. >> >> @Alasdair, Carole, Rafael: as discussed in the thread, at some point >> it shall be beneficial to to invite people from EoL and TDWG. Is >> there some sort of "official" channel for the community to do that? >> >> Have a nice week-end, >> Franck. >> >> Le 17/11/2017 à 10:19, Melanie Courtot a écrit : >>> Hi Frank, all, >>> >>> On 16/11/2017 09:37, Franck Michel wrote: >>>> Hi Meanie, hi all, >>>> >>>> EoL provides an API that returns species descriptions as JSON-LD >>>> based on schemas.org. Beluga example: http://eol.org/api/traits/328541 >>>> It is unclear who consumes this data, but at least, as you already >>>> saw, they embed it at the end of their own web pages such as >>>> http://eol.org/pages/328541/data. >>> BioSamples does the same - an API to retrieve JSON and we embed it >>> in our webpages for crawler as well. >>>> >>>> As you also noticed, the JSON-LD they provide is not valid. I >>>> didn't know about that EOL Github issue, but I recently discussed >>>> it with Rod Page from the Biodiversity Information Standards (aka >>>> TDWG), who replied on the Github issue. The Google structured data >>>> testing tool gives more details on that: https://frama.link/xJm0AAto >>>> Besides, other errors are not reported (well, I think these are >>>> errors): property scienfiticName without any namespace is invalid, >>>> that should be dwc:scientificName since this does not exist in >>>> schema.org. Same issue for vernacularName, traits, units... >>>> >>>> But whatever, this JSON-LD has lots of issues, but it's a start. >>> >>> Yes. Only mentioned the tweaks in case someone wanted to give it a >>> try as well. >>> >>>> The assumption is that there is some sort of specific (one-to-one) >>>> agreement between EoL and Google, and that Google harvests this >>>> data despite the invalid JSON-LD. But I have no confirmation of that >>> >>> It'd be interesting to clarify this. It seems a little bit counter >>> intuitive that EoL would mark their pages up with JSON for Google to >>> read it but then Google couldn't do so without a special adapter? >>> We're probably missing a piece of the story. >>>> >>>> > - the measurement type points to >>>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. >>>> The schema.org/predicate value is also "body length (VT)". How is >>>> this understood and displayed as Length on the Google result? >>>> - Similar question for the actual value and units, which are >>>> "4249.83" and "mm" respectively. Is Google doing some sort of unit >>>> conversion/roundup for display? >>>> >>>> Good question. Typically about the unit "mm": >>>> - "units": "mm" => there is no such thing as http://schema.org/units >>>> - "dwc:measurementUnit": >>>> "http://purl.obolibrary.org/obo/UO_0000016" => this seems to be the >>>> only reliable property, but then Google knows the Darwin Core >>>> vocabulary and interprets it. >>>> My assumption is that Google performs some treatment on the values. >>>> Possibly, they developed a specific connector to cope with EoL >>>> JSON-LD and translate this body size to "4.2 m". >>>> Besides, the snippet mentions "4.2 m *(Adult)*", so they also >>>> presumably consider this property: >>>> eol:traitUri"http://eol.org/resources/704/measurements/adultheadbodylen27" >>>> to know that this is the size of an adult. >>>> >>>> With proper Bioschemas.org profiles, I think we could annotate >>>> pages from many other institutions, such as the Beluga page >>>> <https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french >>>> National Museum of Natural History, and in turn, enable search >>>> engines to harvest data from complimentary pages and produce >>>> mashups of related pages, etc. >>> That sounds like a great idea and entirely within the scope of >>> Bioschemas. >>>> >>>> At this point, I think we should involve people from EoL, and from >>>> the TDWG community (Rod Page would certainly be of great added >>>> value in this respect). What do you think? Is there a procedure for >>>> inviting people "officially"? >>> I think we could benefit from their experience indeed; it seems they >>> were able to deploy markup, add additional properties and then get >>> this to be interpreted by Google which seems to match our use case >>> pretty well! >>> I +1'd the issue at >>> https://github.com/BioSchemas/specifications/issues/115 >>> >>> Cheers, >>> Melanie >>> >>> >>> >>> >>>> >>>> Franck. >>>> >>>> >>>> Le 15/11/2017 à 17:57, Melanie Courtot a écrit : >>>>> Hi Frank, >>>>> >>>>> This looks really interesting, thanks for bringing it up. I was >>>>> trying to find out how the interaction between EoL and schema.org >>>>> was working and am wondering if you (or someone else!) could shed >>>>> some light on this? >>>>> >>>>> As you suggested in the below, I checked the google beluga >>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> >>>>> search result and do see the line "Length: 4.2 m (Adult) >>>>> Encyclopedia of Life" >>>>> >>>>> If I try to find where that info comes from, and head to EoL, I >>>>> can reach the page http://eol.org/pages/328541/overview, and >>>>> follow the "see all traits" link to >>>>> http://eol.org/pages/328541/data which contains the JSON-LD. >>>>> >>>>> I trimmed it down to extract the relevant bit, updated the id to >>>>> be a string as per https://github.com/EOL/tramea/issues/352, and >>>>> pasted it in the JSON playground mostly to make sure it was >>>>> working as expected: http://tinyurl.com/yadam6nj >>>>> >>>>> I am missing the link of how the following happens: >>>>> - the measurement type points to >>>>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. >>>>> The schema.org/predicate value is also "body length (VT)". How is >>>>> this understood and displayed as Length on the Google result? >>>>> - Similar question for the actual value and units, which are >>>>> "4249.83" and "mm" respectively. Is Google doing some sort of unit >>>>> conversion/roundup for display? >>>>> - Trophic level on EoL is "carnivore", but Google displays >>>>> "Carnivorous" >>>>> etc >>>>> >>>>> Or am I looking at the wrong source for the markup? >>>>> >>>>> Cheers, >>>>> Melanie >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 10/11/2017 15:17, Franck Michel wrote: >>>>>> Dear all, >>>>>> >>>>>> I've just joined the Bioschemas.org community following some >>>>>> discussions I had with Alasdair Gray whom I met at ISWC in >>>>>> Vienna, and I'd like to start a new discussion thread. >>>>>> >>>>>> So, just to start, a few words about me. I'm a CNRS research >>>>>> engineer, I work at the I3S laboratory in France, in particular >>>>>> with the Wimmics research team led by Fabien Gandon. I'm >>>>>> currently involved in some activities related to the publication >>>>>> of taxonomic information as Linked Data [1]. In this context, >>>>>> I've met the Biodiversity Information Standards community (TDWG) >>>>>> that is increasingly considering SW standards, LD publication and >>>>>> web pages markup. This is a domain where, I think, it would be >>>>>> relevant for Bioschemas.orgto get involved. >>>>>> >>>>>> There exist lots of web portals reporting observations, traits >>>>>> and other data about all sorts of living organisms. Encyclopedia >>>>>> of Life <http://eol.org/> (EoL) and the Global Biodiversity >>>>>> Information Facility <https://www.gbif.org/> (GBIF) are some of >>>>>> the most well known. Markup questions are actively considered in >>>>>> this field, for instance EoL web pages embed schemas.org-based >>>>>> JSON-LD descriptions that Google leverages to enrich their >>>>>> snippets: e.g. if you google beluga >>>>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> >>>>>> you shall see 'Encyclopedia of Life' mentions in the snippet >>>>>> providing average weight and size data. For now, this seems to be >>>>>> an "individual" initiative between EoL and Google/schemas.org, >>>>>> but it would make sense if this was part of a broader reflection >>>>>> led by Bioschemas.org. >>>>>> >>>>>> My opinion is that fostering the use of common markup by these >>>>>> portals could be very effective in helping the biodiversity >>>>>> community to discover information and figure out new data >>>>>> integration scenarios.Within Bioschemas.org, we could define >>>>>> profiles to account for biodiversity-related >>>>>> information.Taxonomic registers are used as the backbone of many >>>>>> web portals, apps and databases related to biodiversity, agronomy >>>>>> and agriculture.For instance, EoL and GBIF both rely on the >>>>>> Catalog of Life <http://www.catalogueoflife.org/> taxonomy. >>>>>> Therefore, we could start with the definition of a profile to >>>>>> describe a taxon and the related scientific and vernacular names >>>>>> thereof. Then, this could be extended with the representation of >>>>>> traits (characteristics of biological organisms), observations, >>>>>> occurrence data, conservation status (e.g. endangered) etc. There >>>>>> already exist vocabularies for such data such as the well-adopted >>>>>> Darwin Core terms. >>>>>> >>>>>> As a quick example, consider the web page describing the common >>>>>> dolphin on the web site of the french Museum of Natural History: >>>>>> https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This page could >>>>>> come with a JSON-LD desciption looking like this: >>>>>> https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json >>>>>> This example is naive and very succinct, and there are lots of >>>>>> things to discuss and decide. Besides, I've just registered on >>>>>> the mailing yesterday, so it may not fit with good practices that >>>>>> you guys have already agreed upon. Sorry if this is the case. >>>>>> Nevertheless, my point is basically to bootstrap the discussion >>>>>> and see if the community is willing to endorse this initiative. >>>>>> If this is the case, we should probably involve people from the >>>>>> biodiversity community: Darwin Core experts, EoL/GBIF >>>>>> representatives etc. But that will come in time. >>>>>> >>>>>> I look forward to further discussions. >>>>>> Regards, >>>>>> Franck. >>>>>> >>>>>> [1] Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. >>>>>> (2017). A Model to Represent Nomenclatural and Taxonomic >>>>>> Information as Linked Data. Application to the French Taxonomic >>>>>> Register, TAXREF. In Proceedings of the 2nd International >>>>>> Workshop on Semantics for Biodiversity (S4BioDiv) co-located with >>>>>> ISWC 2017 vol. 1933. Vienna, Austria. CEUR. >>>>>> >>>>>> -- >>>>>> signature >>>>>> >>>>>> Franck MICHEL >>>>>> CNRS research engineer >>>>>> +33 (0)492 96 5004 >>>>>> franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr> >>>>>> >>>>>> >>>>>> >>>>>> Université Côte d’Azur, CNRS, *Inria* - I3S - UMR 7271 >>>>>> 930 route des Colles - Bât. Les Templiers >>>>>> BP 145 - 06903 Sophia Antipolis CEDEX - France >>>>>> Tel. +33 (0)4 9294 2680, Fax : +33 (0)4 9294 2898 >>>>>> >>>>> >>>> >>> >> >
Received on Monday, 22 January 2018 10:37:37 UTC