W3C home > Mailing lists > Public > public-bioschemas@w3.org > June 2018

Re: [SUSPECTED SPAM] Re: Bioschemas.org to define biodiversity-related markup

From: Carl Boettiger <cboettig@gmail.com>
Date: Wed, 20 Jun 2018 15:06:13 -0700
Message-ID: <CAN_1p9xQv3E8OABckO+VJ=qx4tbZ_Rcz-peKJQjEe34BaM1xjA@mail.gmail.com>
To: Franck Michel <fmichel@i3s.unice.fr>
Cc: LJ Garcia Castro <ljgarcia@ebi.ac.uk>, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, Ricardo Arcila <arcila@ebi.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
Hi Franck,

I'm also very interested in support for biodiversity-related markup and
happy to help out.  The Doc in the google drive looks like a nice start,
I've added a few comments on the side. If it's helpful, happy to try and
solicit input on this from others in the biodiversity / taxonomic
informatics space.  e.g. I believe EOL is approaching a new release of
their taxonomic data in JSON-LD markup, might be natural to check in with
them as well.

Cheers,

Carl

On Wed, Jun 20, 2018 at 2:48 PM Franck Michel <franck.michel@cnrs.fr> wrote:

> Hi all,
>
> It seems like I've had email issues lately. I just discovered Ricardo's
> and Alasdair's answers in the flow below.
>
> Also, I thought I had submitted a pull request for the creation of a
> _groups/Biodiversity.md file that I had carefully written, but it never
> reached out to Ricardo (and I can't find any trace of it on Gihub ;)).
> Anyway, my idea was to create a Biodiversity group (instead of a Taxon
> group), whose first task would be to define the Taxon profile. There may be
> other profiles defined by this group later on. Are you ok with that?
>
> @Leyla: as a starting point, maybe we can interact through the discussion
> document I associated with the mapping (in the Taxon folder
> <https://drive.google.com/drive/u/0/folders/1Fp2AKbb07So7rVvUhnQIjpl8HLPSwpbP>
> )?
>
> Franck.
>
> Le 20/06/2018 à 19:49, LJ Garcia Castro a écrit :
>
> Hi Franck,
>
> We associate proteins to taxa so I am happy to help. Please add me to the
> loop and let us know what would be the best approach to contribute, i.e.,
> email, comments via gdrive, issues via github, etc.
>
> Regards,
>
> On 15/06/2018 13:10, Gray, Alasdair J G wrote:
>
> Hi All
>
> I’m happy for the taxon group to be created with Franck as the initial
> group lead. Is there someone willing to support Franck in this role?
>
> Alasdair
>
> On 15 Jun 2018, at 12:58, Ricardo Arcila <arcila@ebi.ac.uk> wrote:
>
> Hello Franck,
>
> I have taken the liberty to create a branch
> <https://github.com/BioSchemas/bioschemas.github.io/tree/ric/feat/taxons-group> with
> the draft of the group Taxons, please feel free to adjust it as you see fit.
>
> Kind regards,
> Ricardo
>
> On 12 Jun 2018, at 10:02, Franck Michel <franck.michel@cnrs.fr> wrote:
>
> Dear Ricardo and Leyla,
>
> I just made a pull request, and I created a Biodiversity specification
> folder on Google drive. Let me know if anything is not right. I've set
> myself as the group leader, but I would feel more comfortable if someone of
> the community would join me in this role. And obviously, you are most
> welcome to join the group!
>
> > will be Taxon a BioChemEntity? I am asking because in UniProt we have
> proteins link to what is defined as an "unknown" taxon in NCBI
> taxonomy/UniProt taxonmy. I guess, even if iwe have this "unknown" case, we
> could still use BiochemEntity and suppose any "unknow" will be eventually
> resolve to an actual entity. Happy to chat about it.
> I agree, the large definition of BioChemEntity makes it appropriate as the
> root of Taxon. So far, I think of Taxon as a profile more than a type of
> its own. I'll read the wiki and start drafting something. I let you know if
> (most probably when) I have any question. ;)
>
> Regards,
>     Franck.
>
> Le 11/06/2018 à 15:46, LJ Garcia Castro a écrit :
>
> Hello Franck,
>
> The taxon profile has been mentioned as one we need before but there was
> no group for it. Wonderful you are starting one now! Please ask whenever
> you have a doubt about the process or the different approaches (third-party
> vocabs or additionalProperty) to deal with properties not covered by
> BioChemEntity.
>
> By the way, will be Taxon a BioChemEntity? I am asking because in UniProt
> we have proteins link to what is defined as an "unknown" taxon in NCBI
> taxonomy/UniProt taxonmy. I guess, even if iwe have this "unknown" case, we
> could still use BiochemEntity and suppose any "unknow" will be eventually
> resolve to an actual entity. Happy to chat about it.
>
> Regards,
>
>
>
> On 11/06/2018 14:39, Ricardo Arcila wrote:
>
> Hello Franck,
>
> It is a good idea to start by creating the group. You can do it by
> creating a pull request on the bioschemas groups repository
> <https://github.com/BioSchemas/bioschemas.github.io/tree/master/_groups>.
> Then you can add yourself on the people repository
> <https://github.com/BioSchemas/bioschemas.github.io/tree/master/_people>.
> I will be happy to help you in this process and if you'd like I could be
> part of the group as well.
>
> In order to start a draft specification for Taxon you should create a
> folder with the profile name on the specifications drive folder
> <https://drive.google.com/drive/folders/0Bw_p-HKWUjHoNThZOWNKbGhOODg?usp=sharing>.
> This process its detailed on the bioschemas github wiki
> <https://github.com/BioSchemas/specifications/wiki/Bioschemas-Specification-Process>
> .
>
> Please let me know if you have any question or doubt about the process, I
> will be most happy to help.
>
>
> Best regards,
> Ricardo Arcila
>
>
> On Thu, Jun 7, 2018 at 9:54 AM Franck Michel <franck.michel@cnrs.fr>
> wrote:
>
>> Hi all,
>>
>> I'm catching up with the discussions on the list, and I'm happy to see
>> that things are moving on with the submission of new types to schema.org.
>>
>> At the same time, I realize that we did not really go ahead about the
>> biodiversity topic. As I will present a poster about Bioschemas.org
>> <http://bioschemas.org/> at the Biodiversity Information Standard in
>> August, that would maybe be a good thing to initiate the work on this by
>> this date. How do we go on? I suggested the creation of a a Taxon profile,
>> but we may have to start with the creation of a group?
>> Could you please guide me/us in this process?
>>
>> Thx,
>>     Franck.
>>
>> Le 23/01/2018 à 11:09, Leyla Garcia a écrit :
>>
>> Hello Bioschemas governance team,
>>
>> What do you think about going ahead with the Biodiversity schemas? Do we
>> have a heads up?
>>
>> @Franck, I am not really aware of those organizations but I am happy to
>> guide you through the work we have done for Bioschemas so far. I worked a
>> bit on a biodiversity project but that was some years ago. Still, I like
>> the subject!
>>
>> Let's wait to see what Carole, Rafael and Alasdair suggest.
>>
>> Regards,
>>
>> On 23/01/2018 08:47, Franck Michel wrote:
>>
>> Dear Leyla and all,
>>
>> I understand that your response stands for a GO. Right?
>>
>> I've not been involved yet in the specification of the Bioschemas.org
>> <http://bioschemas.org/> profiles. So indeed, I shall need help and
>> guidance as to how things are going on, the tools, the process, the
>> expected outcomes, etc.
>>
>> As I proposed, we could start with contacting people that would
>> potentially be interested in taking part into this. I'm thinking about
>> Encyclopedia of Life, Catalogue of Life, GBIF. If you already know contacts
>> in these organizations, that would certainly be helpful.
>>
>> Franck.
>>
>> Le 22/01/2018 à 11:37, Leyla Garcia a écrit :
>>
>> Hi Franck,
>>
>> Great news!
>>
>> Do you need any help/guides for the start-up?
>>
>> Cheers,
>>
>>
>> On 17/01/2018 15:24, Franck Michel wrote:
>>
>> Dear all,
>>
>> I'm following up on this suggestion about creating a biodiversity-related
>> group in Bioschemas.org <http://bioschemas.org/>.
>>
>> The proposition received four +1's. I'm not sure if there is a "minimum
>> score" to attest of sufficient consensus.
>>
>> As we discussed, if we go for the creation of this group, it would be
>> beneficial to involve at least EoL folks, possibly other people from the
>> biodiversity community. I can try to initiate this, yet before I would like
>> to have an official GO from our community.
>>
>> Let me know how this usually works, and what you think about this.
>>
>> Regards,
>>     Franck.
>>
>> Le 17/11/2017 à 16:40, Franck Michel a écrit :
>>
>> Hi Mélanie, hi all,
>>
>> To go a bit further I've tried to somewhat extend the example I've
>> initiated. There it is:
>> https://github.com/frmichel/taxref-ld/tree/master/bioschemas-org
>> The README gives details as to how the example file is organized, and
>> more importantly it lists some of the issues and questions that we shall
>> have to tackle if we officially start the group.
>>
>> @Alasdair, Carole, Rafael: as discussed in the thread, at some point it
>> shall be beneficial to to invite people from EoL and TDWG. Is there some
>> sort of "official" channel for the community to do that?
>>
>> Have a nice week-end,
>>     Franck.
>>
>> Le 17/11/2017 à 10:19, Melanie Courtot a écrit :
>>
>> Hi Frank, all,
>>
>> On 16/11/2017 09:37, Franck Michel wrote:
>>
>> Hi Meanie, hi all,
>>
>> EoL provides an API that returns species descriptions as JSON-LD based on
>> schemas.org. Beluga example: http://eol.org/api/traits/328541
>> It is unclear who consumes this data, but at least, as you already saw,
>> they embed it at the end of their own web pages such as
>> http://eol.org/pages/328541/data.
>>
>> BioSamples does the same - an API to retrieve JSON and we embed it in our
>> webpages for crawler as well.
>>
>>
>> As you also noticed, the JSON-LD they provide is not valid. I didn't know
>> about that EOL Github issue, but I recently discussed it with Rod Page from
>> the Biodiversity Information Standards (aka TDWG), who replied on the
>> Github issue. The Google structured data testing tool gives more details on
>> that: https://frama.link/xJm0AAto
>> Besides, other errors are not reported (well, I think these are errors):
>> property scienfiticName without any namespace is invalid, that should be
>> dwc:scientificName since this does not exist in schema.org. Same issue
>> for vernacularName, traits, units...
>>
>> But whatever, this JSON-LD has lots of issues, but it's a start.
>>
>>
>> Yes. Only mentioned the tweaks in case someone wanted to give it a try as
>> well.
>>
>> The assumption is that there is some sort of specific (one-to-one)
>> agreement between EoL and Google, and that Google harvests this data
>> despite the invalid JSON-LD. But I have no confirmation of that
>>
>>
>> It'd be interesting to clarify this. It seems a little bit counter
>> intuitive that EoL would mark their pages up with JSON for Google to read
>> it but then Google couldn't do so without a special adapter? We're probably
>> missing a piece of the story.
>>
>>
>> > - the measurement type points to
>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. The
>> schema.org/predicate value is also "body length (VT)". How is this
>> understood and displayed as Length on the Google result?
>> - Similar question for the actual value and units, which are "4249.83"
>> and "mm" respectively. Is Google doing some sort of unit conversion/roundup
>> for display?
>>
>> Good question. Typically about the unit "mm":
>> - "units": "mm" => there is no such thing as http://schema.org/units
>> - "dwc:measurementUnit": "http://purl.obolibrary.org/obo/UO_0000016"
>> <http://purl.obolibrary.org/obo/UO_0000016> => this seems to be the only
>> reliable property, but then Google knows the Darwin Core vocabulary and
>> interprets it.
>> My assumption is that Google performs some treatment on the values.
>> Possibly, they developed a specific connector to cope with EoL JSON-LD and
>> translate this body size to "4.2 m".
>> Besides, the snippet mentions "4.2 m *(Adult)*", so they also presumably
>> consider this property:
>>     eol:traitUri
>> "http://eol.org/resources/704/measurements/adultheadbodylen27"
>> <http://eol.org/resources/704/measurements/adultheadbodylen27>
>> to know that this is the size of an adult.
>>
>> With proper Bioschemas.org <http://bioschemas.org/> profiles, I think we
>> could annotate pages from many other institutions, such as the Beluga
>> page <https://inpn.mnhn.fr/espece/cd_nom/60932?lg%3Den> on the french
>> National Museum of Natural History, and in turn, enable search engines to
>> harvest data from complimentary pages and produce mashups of related pages,
>> etc.
>>
>> That sounds like a great idea and entirely within the scope of Bioschemas.
>>
>>
>> At this point, I think we should involve people from EoL, and from the
>> TDWG community (Rod Page would certainly be of great added value in this
>> respect). What do you think? Is there a procedure for inviting people
>> "officially"?
>>
>> I think we could benefit from their experience indeed; it seems they were
>> able to deploy markup, add additional properties and then get this to be
>> interpreted by Google which seems to match our use case pretty well!
>> I +1'd the issue at
>> https://github.com/BioSchemas/specifications/issues/115
>>
>> Cheers,
>> Melanie
>>
>>
>>
>>
>>
>> Franck.
>>
>>
>> Le 15/11/2017 à 17:57, Melanie Courtot a écrit :
>>
>> Hi Frank,
>>
>> This looks really interesting, thanks for bringing it up. I was trying to
>> find out how the interaction between EoL and schema.org was working and
>> am wondering if you (or someone else!) could shed some light on this?
>>
>> As you suggested in the below, I checked the google beluga
>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc>
>> search result and do see the line "Length: 4.2 m (Adult) Encyclopedia of
>> Life"
>>
>> If I try to find where that info comes from, and head to EoL, I can reach
>> the page http://eol.org/pages/328541/overview, and follow the "see all
>> traits" link to http://eol.org/pages/328541/data which contains the
>> JSON-LD.
>>
>> I trimmed it down to extract the relevant bit, updated the id to be a
>> string as per https://github.com/EOL/tramea/issues/352, and pasted it in
>> the JSON playground mostly to make sure it was working as expected:
>> http://tinyurl.com/yadam6nj
>>
>> I am missing the link of how the following happens:
>> - the measurement type points to
>> http://purl.obolibrary.org/obo/VT_0001256, which is body length. The
>> schema.org/predicate value is also "body length (VT)". How is this
>> understood and displayed as Length on the Google result?
>> - Similar question for the actual value and units, which are "4249.83"
>> and "mm" respectively. Is Google doing some sort of unit conversion/roundup
>> for display?
>> - Trophic level on EoL is "carnivore", but Google displays "Carnivorous"
>> etc
>>
>> Or am I looking at the wrong source for the markup?
>>
>> Cheers,
>> Melanie
>>
>>
>>
>>
>>
>>
>> On 10/11/2017 15:17, Franck Michel wrote:
>>
>> Dear all,
>>
>> I've just joined the Bioschemas.org <http://bioschemas.org/> community
>> following some discussions I had with Alasdair Gray whom I met at ISWC in
>> Vienna, and I'd like to start a new discussion thread.
>>
>> So, just to start, a few words about me. I'm a CNRS research engineer, I
>> work at the I3S laboratory in France, in particular with the Wimmics
>> research team led by Fabien Gandon. I'm currently involved in some
>> activities related to the publication of taxonomic information as Linked
>> Data [1]. In this context, I've met the Biodiversity Information Standards
>> community (TDWG) that is increasingly considering SW standards, LD
>> publication and web pages markup. This is a domain where, I think, it would
>> be relevant for Bioschemas.org <http://bioschemas.org/> to get involved.
>>
>> There exist lots of web portals reporting observations, traits and other
>> data about all sorts of living organisms. Encyclopedia of Life
>> <http://eol.org/> (EoL) and the Global Biodiversity Information Facility
>> <https://www.gbif.org/> (GBIF) are some of the most well known. Markup
>> questions are actively considered in this field, for instance EoL web pages
>> embed schemas.org-based JSON-LD descriptions that Google leverages to
>> enrich their snippets: e.g. if you google beluga
>> <https://www.google.fr/search?dcr=0&ei=ml74WajPMMzWUabjqvAF&q=beluga&oq=beluga&gs_l=psy-ab.3...19519.20929.0.20945.6.3.0.0.0.0.93.93.1.1.0....0...1.1.64.psy-ab..5.1.92...0j0i131k1.0.AGNziTItYzc>
>> you shall see 'Encyclopedia of Life' mentions in the snippet providing
>> average weight and size data. For now, this seems to be an "individual"
>> initiative between EoL and Google/schemas.org, but it would make sense
>> if this was part of a broader reflection led by Bioschemas.org
>> <http://bioschemas.org/>.
>>
>> My opinion is that fostering the use of common markup by these portals could
>> be very effective in helping the biodiversity community to discover
>> information and figure out new data integration scenarios. Within
>> Bioschemas.org <http://bioschemas.org/>, we could define profiles to
>> account for biodiversity-related information. Taxonomic registers are
>> used as the backbone of many web portals, apps and databases related to
>> biodiversity, agronomy and agriculture. For instance, EoL and GBIF both
>> rely on the Catalog of Life <http://www.catalogueoflife.org/> taxonomy.
>> Therefore, we could start with the definition of a profile to describe a
>> taxon and the related scientific and vernacular names thereof. Then, this
>> could be extended with the representation of traits (characteristics of
>> biological organisms), observations, occurrence data, conservation status
>> (e.g. endangered) etc. There already exist vocabularies for such data such
>> as the well-adopted Darwin Core terms.
>>
>> As a quick example, consider the web page describing the common dolphin
>> on the web site of the french Museum of Natural History:
>> https://inpn.mnhn.fr/espece/cd_nom/60878?lg=en. This page could come
>> with a JSON-LD desciption looking like this:
>> https://github.com/frmichel/taxref-ld/blob/master/bioschemas-org-example.json
>> This example is naive and very succinct, and there are lots of things to
>> discuss and decide. Besides, I've just registered on the mailing
>> yesterday, so it may not fit with good practices that you guys have
>> already agreed upon. Sorry if this is the case. Nevertheless, my point
>> is basically to bootstrap the discussion and see if the community is
>> willing to endorse this initiative. If this is the case, we should probably
>> involve people from the biodiversity community: Darwin Core experts,
>> EoL/GBIF representatives etc. But that will come in time.
>>
>> I look forward to further discussions.
>> Regards,
>>    Franck.
>>
>> [1] Michel F., Gargominy O., Tercerie S. &
>>
>> --

http://carlboettiger.info
Received on Wednesday, 20 June 2018 22:06:53 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:05 UTC