W3C home > Mailing lists > Public > public-bioschemas@w3.org > November 2017

Re: Bioschemas.org to define biodiversity-related markup

From: Melanie Courtot <mcourtot@ebi.ac.uk>
Date: Fri, 17 Nov 2017 09:19:09 +0000
To: Franck Michel <franck.michel@cnrs.fr>, public-bioschemas@w3.org
Message-ID: <b35f1ce0-416e-a1bb-2630-eb9bd97bd583@ebi.ac.uk>
Hi Frank, all,

On 16/11/2017 09:37, Franck Michel wrote:
> Hi Meanie, hi all,
>
> EoL provides an API that returns species descriptions as JSON-LD based 
> on schemas.org. Beluga example: http://eol.org/api/traits/328541
> It is unclear who consumes this data, but at least, as you already 
> saw, they embed it at the end of their own web pages such as 
> http://eol.org/pages/328541/data.
BioSamples does the same - an API to retrieve JSON and we embed it in 
our webpages for crawler as well.
>
> As you also noticed, the JSON-LD they provide is not valid. I didn't 
> know about that EOL Github issue, but I recently discussed it with Rod 
> Page from the Biodiversity Information Standards (aka TDWG), who 
> replied on the Github issue. The Google structured data testing tool 
> gives more details on that: https://frama.link/xJm0AAto
> Besides, other errors are not reported (well, I think these are 
> errors): property scienfiticName without any namespace is invalid, 
> that should be dwc:scientificName since this does not exist in 
> schema.org. Same issue for vernacularName, traits, units...
>
> But whatever, this JSON-LD has lots of issues, but it's a start. 

Yes. Only mentioned the tweaks in case someone wanted to give it a try 
as well.

> The assumption is that there is some sort of specific (one-to-one) 
> agreement between EoL and Google, and that Google harvests this data 
> despite the invalid JSON-LD. But I have no confirmation of that

It'd be interesting to clarify this. It seems a little bit counter 
intuitive that EoL would mark their pages up with JSON for Google to 
read it but then Google couldn't do so without a special adapter? We're 
probably missing a piece of the story.
>
> > - the measurement type points to 
> http://purl.obolibrary.org/obo/VT_0001256, which is body length. The 
> schema.org/predicate value is also "body length (VT)". How is this 
> understood and displayed as Length on the Google result?
> - Similar question for the actual value and units, which are "4249.83" 
> and "mm" respectively. Is Google doing some sort of unit 
> conversion/roundup for display?
>
> Good question. Typically about the unit "mm":
> - "units": "mm" => there is no such thing as http://schema.org/units
> - "dwc:measurementUnit": "http://purl.obolibrary.org/obo/UO_0000016" 
> => this seems to be the only reliable property, but then Google knows 
> the Darwin Core vocabulary and interprets it.
> My assumption is that Google performs some treatment on the values. 
> Possibly, they developed a specific connector to cope with EoL JSON-LD 
> and translate this body size to "4.2 m".
> Besides, the snippet mentions "4.2 m *(Adult)*", so they also 
> presumably consider this property:
> eol:traitUri"http://eol.org/resources/704/measurements/adultheadbodylen27"
> to know that this is the size of an adult.
>
> With proper Bioschemas.org profiles, I think we could annotate pages 
> from many other institutions, such as the Beluga page 
> <https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french 
> National Museum of Natural History, and in turn, enable search engines 
> to harvest data from complimentary pages and produce mashups of 
> related pages, etc.
That sounds like a great idea and entirely within the scope of Bioschemas.
>
> At this point, I think we should involve people from EoL, and from the 
> TDWG community (Rod Page would certainly be of great added value in 
> this respect). What do you think? Is there a procedure for inviting 
> people "officially"?
I think we could benefit from their experience indeed; it seems they 
were able to deploy markup, add additional properties and then get this 
to be interpreted by Google which seems to match our use case pretty well!
I +1'd the issue at https://github.com/BioSchemas/specifications/issues/115

Cheers,
Melanie




>
> Franck.
>
>
> Le 15/11/2017 à 17:57, Melanie Courtot a écrit :
>> Hi Frank,
>>
>> This looks really interesting, thanks for bringing it up. I was 
>> trying to find out how the interaction between EoL and schema.org was 
>> working and am wondering if you (or someone else!) could shed some 
>> light on this?
>>
>> As you suggested in the below, I checked the google beluga 
>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
>> search result and do see the line "Length: 4.2 m (Adult) Encyclopedia 
>> of Life"
>>
>> If I try to find where that info comes from, and head to EoL, I can 
>> reach the page http://eol.org/pages/328541/overview, and follow the 
>> "see all traits" link to http://eol.org/pages/328541/data which 
>> contains the JSON-LD.
>>
>> I trimmed it down to extract the relevant bit, updated the id to be a 
>> string as per https://github.com/EOL/tramea/issues/352, and pasted it 
>> in the JSON playground mostly to make sure it was working as 
>> expected: http://tinyurl.com/yadam6nj
>>
>> I am missing the link of how the following happens:
>> - the measurement type points to 
>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. The 
>> schema.org/predicate value is also "body length (VT)". How is this 
>> understood and displayed as Length on the Google result?
>> - Similar question for the actual value and units, which are 
>> "4249.83" and "mm" respectively. Is Google doing some sort of unit 
>> conversion/roundup for display?
>> - Trophic level on EoL is "carnivore", but Google displays "Carnivorous"
>> etc
>>
>> Or am I looking at the wrong source for the markup?
>>
>> Cheers,
>> Melanie
>>
>>
>>
>>
>>
>>
>> On 10/11/2017 15:17, Franck Michel wrote:
>>> Dear all,
>>>
>>> I've just joined the Bioschemas.org community following some 
>>> discussions I had with Alasdair Gray whom I met at ISWC in Vienna, 
>>> and I'd like to start a new discussion thread.
>>>
>>> So, just to start, a few words about me. I'm a CNRS research 
>>> engineer, I work at the I3S laboratory in France, in particular with 
>>> the Wimmics research team led by Fabien Gandon. I'm currently 
>>> involved in some activities related to the publication of taxonomic 
>>> information as Linked Data [1]. In this context, I've met the 
>>> Biodiversity Information Standards community (TDWG) that is 
>>> increasingly considering SW standards, LD publication and web pages 
>>> markup. This is a domain where, I think, it would be relevant for 
>>> Bioschemas.orgto get involved.
>>>
>>> There exist lots of web portals reporting observations, traits and 
>>> other data about all sorts of living organisms. Encyclopedia of Life 
>>> <http://eol.org/> (EoL) and the Global Biodiversity Information 
>>> Facility <https://www.gbif.org/> (GBIF) are some of the most well 
>>> known. Markup questions are actively considered in this field, for 
>>> instance EoL web pages embed schemas.org-based JSON-LD descriptions 
>>> that Google leverages to enrich their snippets: e.g. if you google 
>>> beluga 
>>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc> 
>>> you shall see 'Encyclopedia of Life' mentions in the snippet 
>>> providing average weight and size data. For now, this seems to be an 
>>> "individual" initiative between EoL and Google/schemas.org, but it 
>>> would make sense if this was part of a broader reflection led by 
>>> Bioschemas.org.
>>>
>>> My opinion is that fostering the use of common markup by these 
>>> portals could be very effective in helping the biodiversity 
>>> community to discover information and figure out new data 
>>> integration scenarios.Within Bioschemas.org, we could define 
>>> profiles to account for biodiversity-related information.Taxonomic 
>>> registers are used as the backbone of many web portals, apps and 
>>> databases related to biodiversity, agronomy and agriculture.For 
>>> instance, EoL and GBIF both rely on the Catalog of Life 
>>> <http://www.catalogueoflife.org/> taxonomy. Therefore, we could 
>>> start with the definition of a profile to describe a taxon and the 
>>> related scientific and vernacular names thereof. Then, this could be 
>>> extended with the representation of traits (characteristics of 
>>> biological organisms), observations, occurrence data, conservation 
>>> status (e.g. endangered) etc. There already exist vocabularies for 
>>> such data such as the well-adopted Darwin Core terms.
>>>
>>> As a quick example, consider the web page describing the common 
>>> dolphin on the web site of the french Museum of Natural History: 
>>> https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This page could come 
>>> with a JSON-LD desciption looking like this: 
>>> https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json
>>> This example is naive and very succinct, and there are lots of 
>>> things to discuss and decide. Besides, I've just registered on the 
>>> mailing yesterday, so it may not fit with good practices that you 
>>> guys have already agreed upon. Sorry if this is the case. 
>>> Nevertheless, my point is basically to bootstrap the discussion and 
>>> see if the community is willing to endorse this initiative. If this 
>>> is the case, we should probably involve people from the biodiversity 
>>> community: Darwin Core experts, EoL/GBIF representatives etc. But 
>>> that will come in time.
>>>
>>> I look forward to further discussions.
>>> Regards,
>>>    Franck.
>>>
>>> [1] Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. (2017). A 
>>> Model to Represent Nomenclatural and Taxonomic Information as Linked 
>>> Data. Application to the French Taxonomic Register, TAXREF. In 
>>> Proceedings of the 2nd International Workshop on Semantics for 
>>> Biodiversity (S4BioDiv) co-located with ISWC 2017 vol. 1933. Vienna, 
>>> Austria. CEUR.
>>>
>>> -- 
>>> signature
>>> 	
>>> Franck MICHEL
>>> CNRS research engineer
>>> 	+33 (0)492 96 5004
>>> franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr>
>>>
>>> 	
>>>
>>> Université Côte d’Azur, CNRS, *Inria* - I3S - UMR 7271
>>> 930 route des Colles - Bât. Les Templiers
>>> BP 145 - 06903 Sophia Antipolis CEDEX - France
>>> Tel. +33 (0)4 9294 2680, Fax : +33 (0)4 9294 2898
>>>
>>
>
Received on Friday, 17 November 2017 09:19:38 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:00 UTC