- From: Melanie Courtot <mcourtot@ebi.ac.uk>
- Date: Fri, 17 Nov 2017 09:19:09 +0000
- To: Franck Michel <franck.michel@cnrs.fr>, public-bioschemas@w3.org
- Message-ID: <b35f1ce0-416e-a1bb-2630-eb9bd97bd583@ebi.ac.uk>
Hi Frank, all, On 16/11/2017 09:37, Franck Michel wrote: > Hi Meanie, hi all, > > EoL provides an API that returns species descriptions as JSON-LD based > on schemas.org. Beluga example: http://eol.org/api/traits/328541 > It is unclear who consumes this data, but at least, as you already > saw, they embed it at the end of their own web pages such as > http://eol.org/pages/328541/data. BioSamples does the same - an API to retrieve JSON and we embed it in our webpages for crawler as well. > > As you also noticed, the JSON-LD they provide is not valid. I didn't > know about that EOL Github issue, but I recently discussed it with Rod > Page from the Biodiversity Information Standards (aka TDWG), who > replied on the Github issue. The Google structured data testing tool > gives more details on that: https://frama.link/xJm0AAto > Besides, other errors are not reported (well, I think these are > errors): property scienfiticName without any namespace is invalid, > that should be dwc:scientificName since this does not exist in > schema.org. Same issue for vernacularName, traits, units... > > But whatever, this JSON-LD has lots of issues, but it's a start. Yes. Only mentioned the tweaks in case someone wanted to give it a try as well. > The assumption is that there is some sort of specific (one-to-one) > agreement between EoL and Google, and that Google harvests this data > despite the invalid JSON-LD. But I have no confirmation of that It'd be interesting to clarify this. It seems a little bit counter intuitive that EoL would mark their pages up with JSON for Google to read it but then Google couldn't do so without a special adapter? We're probably missing a piece of the story. > > > - the measurement type points to > http://purl.obolibrary.org/obo/VT_0001256, which is body length. The > schema.org/predicate value is also "body length (VT)". How is this > understood and displayed as Length on the Google result? > - Similar question for the actual value and units, which are "4249.83" > and "mm" respectively. Is Google doing some sort of unit > conversion/roundup for display? > > Good question. Typically about the unit "mm": > - "units": "mm" => there is no such thing as http://schema.org/units > - "dwc:measurementUnit": "http://purl.obolibrary.org/obo/UO_0000016" > => this seems to be the only reliable property, but then Google knows > the Darwin Core vocabulary and interprets it. > My assumption is that Google performs some treatment on the values. > Possibly, they developed a specific connector to cope with EoL JSON-LD > and translate this body size to "4.2 m". > Besides, the snippet mentions "4.2 m *(Adult)*", so they also > presumably consider this property: > eol:traitUri"http://eol.org/resources/704/measurements/adultheadbodylen27" > to know that this is the size of an adult. > > With proper Bioschemas.org profiles, I think we could annotate pages > from many other institutions, such as the Beluga page > <https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french > National Museum of Natural History, and in turn, enable search engines > to harvest data from complimentary pages and produce mashups of > related pages, etc. That sounds like a great idea and entirely within the scope of Bioschemas. > > At this point, I think we should involve people from EoL, and from the > TDWG community (Rod Page would certainly be of great added value in > this respect). What do you think? Is there a procedure for inviting > people "officially"? I think we could benefit from their experience indeed; it seems they were able to deploy markup, add additional properties and then get this to be interpreted by Google which seems to match our use case pretty well! I +1'd the issue at https://github.com/BioSchemas/specifications/issues/115 Cheers, Melanie > > Franck. > > > Le 15/11/2017 à 17:57, Melanie Courtot a écrit : >> Hi Frank, >> >> This looks really interesting, thanks for bringing it up. I was >> trying to find out how the interaction between EoL and schema.org was >> working and am wondering if you (or someone else!) could shed some >> light on this? >> >> As you suggested in the below, I checked the google beluga >> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> >> search result and do see the line "Length: 4.2 m (Adult) Encyclopedia >> of Life" >> >> If I try to find where that info comes from, and head to EoL, I can >> reach the page http://eol.org/pages/328541/overview, and follow the >> "see all traits" link to http://eol.org/pages/328541/data which >> contains the JSON-LD. >> >> I trimmed it down to extract the relevant bit, updated the id to be a >> string as per https://github.com/EOL/tramea/issues/352, and pasted it >> in the JSON playground mostly to make sure it was working as >> expected: http://tinyurl.com/yadam6nj >> >> I am missing the link of how the following happens: >> - the measurement type points to >> http://purl.obolibrary.org/obo/VT_0001256, which is body length. The >> schema.org/predicate value is also "body length (VT)". How is this >> understood and displayed as Length on the Google result? >> - Similar question for the actual value and units, which are >> "4249.83" and "mm" respectively. Is Google doing some sort of unit >> conversion/roundup for display? >> - Trophic level on EoL is "carnivore", but Google displays "Carnivorous" >> etc >> >> Or am I looking at the wrong source for the markup? >> >> Cheers, >> Melanie >> >> >> >> >> >> >> On 10/11/2017 15:17, Franck Michel wrote: >>> Dear all, >>> >>> I've just joined the Bioschemas.org community following some >>> discussions I had with Alasdair Gray whom I met at ISWC in Vienna, >>> and I'd like to start a new discussion thread. >>> >>> So, just to start, a few words about me. I'm a CNRS research >>> engineer, I work at the I3S laboratory in France, in particular with >>> the Wimmics research team led by Fabien Gandon. I'm currently >>> involved in some activities related to the publication of taxonomic >>> information as Linked Data [1]. In this context, I've met the >>> Biodiversity Information Standards community (TDWG) that is >>> increasingly considering SW standards, LD publication and web pages >>> markup. This is a domain where, I think, it would be relevant for >>> Bioschemas.orgto get involved. >>> >>> There exist lots of web portals reporting observations, traits and >>> other data about all sorts of living organisms. Encyclopedia of Life >>> <http://eol.org/> (EoL) and the Global Biodiversity Information >>> Facility <https://www.gbif.org/> (GBIF) are some of the most well >>> known. Markup questions are actively considered in this field, for >>> instance EoL web pages embed schemas.org-based JSON-LD descriptions >>> that Google leverages to enrich their snippets: e.g. if you google >>> beluga >>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> >>> you shall see 'Encyclopedia of Life' mentions in the snippet >>> providing average weight and size data. For now, this seems to be an >>> "individual" initiative between EoL and Google/schemas.org, but it >>> would make sense if this was part of a broader reflection led by >>> Bioschemas.org. >>> >>> My opinion is that fostering the use of common markup by these >>> portals could be very effective in helping the biodiversity >>> community to discover information and figure out new data >>> integration scenarios.Within Bioschemas.org, we could define >>> profiles to account for biodiversity-related information.Taxonomic >>> registers are used as the backbone of many web portals, apps and >>> databases related to biodiversity, agronomy and agriculture.For >>> instance, EoL and GBIF both rely on the Catalog of Life >>> <http://www.catalogueoflife.org/> taxonomy. Therefore, we could >>> start with the definition of a profile to describe a taxon and the >>> related scientific and vernacular names thereof. Then, this could be >>> extended with the representation of traits (characteristics of >>> biological organisms), observations, occurrence data, conservation >>> status (e.g. endangered) etc. There already exist vocabularies for >>> such data such as the well-adopted Darwin Core terms. >>> >>> As a quick example, consider the web page describing the common >>> dolphin on the web site of the french Museum of Natural History: >>> https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This page could come >>> with a JSON-LD desciption looking like this: >>> https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json >>> This example is naive and very succinct, and there are lots of >>> things to discuss and decide. Besides, I've just registered on the >>> mailing yesterday, so it may not fit with good practices that you >>> guys have already agreed upon. Sorry if this is the case. >>> Nevertheless, my point is basically to bootstrap the discussion and >>> see if the community is willing to endorse this initiative. If this >>> is the case, we should probably involve people from the biodiversity >>> community: Darwin Core experts, EoL/GBIF representatives etc. But >>> that will come in time. >>> >>> I look forward to further discussions. >>> Regards, >>> Franck. >>> >>> [1] Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. (2017). A >>> Model to Represent Nomenclatural and Taxonomic Information as Linked >>> Data. Application to the French Taxonomic Register, TAXREF. In >>> Proceedings of the 2nd International Workshop on Semantics for >>> Biodiversity (S4BioDiv) co-located with ISWC 2017 vol. 1933. Vienna, >>> Austria. CEUR. >>> >>> -- >>> signature >>> >>> Franck MICHEL >>> CNRS research engineer >>> +33 (0)492 96 5004 >>> franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr> >>> >>> >>> >>> Université Côte d’Azur, CNRS, *Inria* - I3S - UMR 7271 >>> 930 route des Colles - Bât. Les Templiers >>> BP 145 - 06903 Sophia Antipolis CEDEX - France >>> Tel. +33 (0)4 9294 2680, Fax : +33 (0)4 9294 2898 >>> >> >
Received on Friday, 17 November 2017 09:19:38 UTC