Re: Bioschemas.org to define biodiversity-related markup

Hi Meanie, hi all,

EoL provides an API that returns species descriptions as JSON-LD based 
on schemas.org. Beluga example: http://eol.org/api/traits/328541
It is unclear who consumes this data, but at least, as you already saw, 
they embed it at the end of their own web pages such as 
http://eol.org/pages/328541/data.

As you also noticed, the JSON-LD they provide is not valid. I didn't 
know about that EOL Github issue, but I recently discussed it with Rod 
Page from the Biodiversity Information Standards (aka TDWG), who replied 
on the Github issue. The Google structured data testing tool gives more 
details on that: https://frama.link/xJm0AAto
Besides, other errors are not reported (well, I think these are errors): 
property scienfiticName without any namespace is invalid, that should be 
dwc:scientificName since this does not exist in schema.org. Same issue 
for vernacularName, traits, units...

But whatever, this JSON-LD has lots of issues, but it's a start. The 
assumption is that there is some sort of specific (one-to-one) agreement 
between EoL and Google, and that Google harvests this data despite the 
invalid JSON-LD. But I have no confirmation of that

 > - the measurement type points to 
http://purl.obolibrary.org/obo/VT_0001256, which is body length. The 
schema.org/predicate value is also "body length (VT)". How is this 
understood and displayed as Length on the Google result?
- Similar question for the actual value and units, which are "4249.83" 
and "mm" respectively. Is Google doing some sort of unit 
conversion/roundup for display?

Good question. Typically about the unit "mm":
- "units": "mm" => there is no such thing as http://schema.org/units
- "dwc:measurementUnit": "http://purl.obolibrary.org/obo/UO_0000016" => 
this seems to be the only reliable property, but then Google knows the 
Darwin Core vocabulary and interprets it.
My assumption is that Google performs some treatment on the values. 
Possibly, they developed a specific connector to cope with EoL JSON-LD 
and translate this body size to "4.2 m".
Besides, the snippet mentions "4.2 m *(Adult)*", so they also presumably 
consider this property:
eol:traitUri"http://eol.org/resources/704/measurements/adultheadbodylen27"
to know that this is the size of an adult.

With proper Bioschemas.org profiles, I think we could annotate pages 
from many other institutions, such as the Beluga page 
<https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french 
National Museum of Natural History, and in turn, enable search engines 
to harvest data from complimentary pages and produce mashups of related 
pages, etc.

At this point, I think we should involve people from EoL, and from the 
TDWG community (Rod Page would certainly be of great added value in this 
respect). What do you think? Is there a procedure for inviting people 
"officially"?

Franck.


Le 15/11/2017 à 17:57, Melanie Courtot a écrit :
> Hi Frank,
>
> This looks really interesting, thanks for bringing it up. I was trying 
> to find out how the interaction between EoL and schema.org was working 
> and am wondering if you (or someone else!) could shed some light on this?
>
> As you suggested in the below, I checked the google beluga 
> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
> search result and do see the line "Length: 4.2 m (Adult) Encyclopedia 
> of Life"
>
> If I try to find where that info comes from, and head to EoL, I can 
> reach the page http://eol.org/pages/328541/overview, and follow the 
> "see all traits" link to http://eol.org/pages/328541/data which 
> contains the JSON-LD.
>
> I trimmed it down to extract the relevant bit, updated the id to be a 
> string as per https://github.com/EOL/tramea/issues/352, and pasted it 
> in the JSON playground mostly to make sure it was working as expected: 
> http://tinyurl.com/yadam6nj
>
> I am missing the link of how the following happens:
> - the measurement type points to 
> http://purl.obolibrary.org/obo/VT_0001256, which is body length. The 
> schema.org/predicate value is also "body length (VT)". How is this 
> understood and displayed as Length on the Google result?
> - Similar question for the actual value and units, which are "4249.83" 
> and "mm" respectively. Is Google doing some sort of unit 
> conversion/roundup for display?
> - Trophic level on EoL is "carnivore", but Google displays "Carnivorous"
> etc
>
> Or am I looking at the wrong source for the markup?
>
> Cheers,
> Melanie
>
>
>
>
>
>
> On 10/11/2017 15:17, Franck Michel wrote:
>> Dear all,
>>
>> I've just joined the Bioschemas.org community following some 
>> discussions I had with Alasdair Gray whom I met at ISWC in Vienna, 
>> and I'd like to start a new discussion thread.
>>
>> So, just to start, a few words about me. I'm a CNRS research 
>> engineer, I work at the I3S laboratory in France, in particular with 
>> the Wimmics research team led by Fabien Gandon. I'm currently 
>> involved in some activities related to the publication of taxonomic 
>> information as Linked Data [1]. In this context, I've met the 
>> Biodiversity Information Standards community (TDWG) that is 
>> increasingly considering SW standards, LD publication and web pages 
>> markup. This is a domain where, I think, it would be relevant for 
>> Bioschemas.orgto get involved.
>>
>> There exist lots of web portals reporting observations, traits and 
>> other data about all sorts of living organisms. Encyclopedia of Life 
>> <http://eol.org/> (EoL) and the Global Biodiversity Information 
>> Facility <https://www.gbif.org/> (GBIF) are some of the most well 
>> known. Markup questions are actively considered in this field, for 
>> instance EoL web pages embed schemas.org-based JSON-LD descriptions 
>> that Google leverages to enrich their snippets: e.g. if you google 
>> beluga 
>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
>> you shall see 'Encyclopedia of Life' mentions in the snippet 
>> providing average weight and size data. For now, this seems to be an 
>> "individual" initiative between EoL and Google/schemas.org, but it 
>> would make sense if this was part of a broader reflection led by 
>> Bioschemas.org.
>>
>> My opinion is that fostering the use of common markup by these 
>> portals could be very effective in helping the biodiversity community 
>> to discover information and figure out new data integration 
>> scenarios.Within Bioschemas.org, we could define profiles to account 
>> for biodiversity-related information.Taxonomic registers are used as 
>> the backbone of many web portals, apps and databases related to 
>> biodiversity, agronomy and agriculture.For instance, EoL and GBIF 
>> both rely on the Catalog of Life <http://www.catalogueoflife.org/> 
>> taxonomy. Therefore, we could start with the definition of a profile 
>> to describe a taxon and the related scientific and vernacular names 
>> thereof. Then, this could be extended with the representation of 
>> traits (characteristics of biological organisms), observations, 
>> occurrence data, conservation status (e.g. endangered) etc. There 
>> already exist vocabularies for such data such as the well-adopted 
>> Darwin Core terms.
>>
>> As a quick example, consider the web page describing the common 
>> dolphin on the web site of the french Museum of Natural History: 
>> https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This page could come 
>> with a JSON-LD desciption looking like this: 
>> https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json
>> This example is naive and very succinct, and there are lots of things 
>> to discuss and decide. Besides, I've just registered on the mailing 
>> yesterday, so it may not fit with good practices that you guys have 
>> already agreed upon. Sorry if this is the case. Nevertheless, my 
>> point is basically to bootstrap the discussion and see if the 
>> community is willing to endorse this initiative. If this is the case, 
>> we should probably involve people from the biodiversity community: 
>> Darwin Core experts, EoL/GBIF representatives etc. But that will come 
>> in time.
>>
>> I look forward to further discussions.
>> Regards,
>>    Franck.
>>
>> [1] Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. (2017). A 
>> Model to Represent Nomenclatural and Taxonomic Information as Linked 
>> Data. Application to the French Taxonomic Register, TAXREF. In 
>> Proceedings of the 2nd International Workshop on Semantics for 
>> Biodiversity (S4BioDiv) co-located with ISWC 2017 vol. 1933. Vienna, 
>> Austria. CEUR.
>>
>> -- 
>> signature
>>  
>> Franck MICHEL
>> CNRS research engineer
>>  +33 (0)492 96 5004
>> franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr>
>>
>>  
>>
>> Université Côte d’Azur, CNRS, *Inria* - I3S - UMR 7271
>> 930 route des Colles - Bât. Les Templiers
>> BP 145 - 06903 Sophia Antipolis CEDEX - France
>> Tel. +33 (0)4 9294 2680, Fax : +33 (0)4 9294 2898
>>
>

Received on Thursday, 16 November 2017 09:38:24 UTC