Re: FW: [schemaorg/schemaorg] Update to: Core Types to Support the Discovery of Life Sciences Resources (#2711) from Franck Michel on 2021-04-02 (public-bioschemas@w3.org from April 2021)

From: Franck Michel <franck.michel@cnrs.fr>
Date: Fri, 2 Apr 2021 09:36:49 +0200
To: "Gray, Alasdair" <A.J.G.Gray@hw.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
Message-ID: <af870430-6309-b202-48a7-04cfe5f5f73c@cnrs.fr>
Hi Alasdair and all,

Thanks for the information. This is great news indeed.

I very much appreciate Dan's focus on considering the building of 
knowledge graphs as an important part of what they consider 
markup-consuming applications, since consumption evidence is an obvious 
requirement for Schema.org to eventually endorse terms.

Good to see that things go forward!

Franck.

Le 01/04/2021 à 16:19, Gray, Alasdair a écrit :
>
> Hi All,
>
> Please see the below for details of the response from Dan on our 
> request to merge our first collection of types into Schema.org.
>
> I think this is a hugely positive step forward. Hopefully the 
> inclusion into pending will entice more people to both deploy but also 
> to build applications that rely on our proposed types.
>
> Best regards
>
> Alasdair
>
> -- 
>
> Alasdair J G Gray
>
> Associate Professor in Computer Science,
> School of Mathematical and Computer Sciences
> Heriot-Watt University, Edinburgh, UK.
>
> Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
> Web: http://www.macs.hw.ac.uk/~ajg33 <http://www.macs.hw.ac.uk/~ajg33>
> ORCID: http://orcid.org/0000-0002-5711-4872 
> <http://orcid.org/0000-0002-5711-4872>
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
>
> Heriot-Watt is a global University, as a result my working hours may 
> not be your working hours. Do not feel pressure to reply to this email 
> outside your working hours.
>
> To arrange a 
> meeting: https://outlook.office365.com/owa/calendar/AlasdairGray@heriotwatt.onmicrosoft.com/bookings/
>
> *From: *"notifications@github.com" <notifications@github.com>
> *Reply to: *schemaorg/schemaorg 
> <reply+AAIWUENVPLBNZFYILBZRFFF6OGZWDEVBNHHCT3QWYQ@reply.github.com>
> *Date: *Thursday, 1 April 2021 at 15:02
> *To: *schemaorg/schemaorg <schemaorg@noreply.github.com>
> *Cc: *Alasdair Gray <A.J.G.Gray@hw.ac.uk>, 
> "mention@noreply.github.com" <mention@noreply.github.com>
> *Subject: *Re: [schemaorg/schemaorg] Update to: Core Types to Support 
> the Discovery of Life Sciences Resources (#2711)
>
> *****************************************************************
> **Caution: This email originated from a sender outside Heriot-Watt 
> University.
> Do not follow links or open attachments if you doubt the authenticity 
> of the sender or the content. **
> *******************************************************************
>
> *Short version:* My sense is that we should get this into Pending, 
> with a view towards them becoming part of core schema.org as evidence 
> of data-consuming applications is collected. Based on the experience 
> of the last few years, we should also expand our notion of 
> "data-consuming applications" to cover developer and datascientist 
> -facing applications, such as public open data knowledge graphs. I 
> believe the bioschemas schemas have great potential, but we have work 
> to do yet to determine quite what level of detail is going to prove 
> appropriate for this kind of vocabulary.
>
> Next steps: I've asked @RichardWallis 
> <https://github.com/RichardWallis> to take a look at some minor fixes 
> to the PR, to mark these terms as part of the Pending area of 
> schema.org, and remove any conflicts (e.g. 
> SchemaExamples/schemaexamples.py needs removing).
>
>
>   Status and Context and expectation setting
>
> When the Bioschemas activity was first suggested we (Schema.org leads) 
> were initially wary of bringing Schema.org into an area where there 
> were a great number of existing scientific and research data 
> ontologies, unless there was a serious prospect of the schemas being 
> used in substantive user-benefitting applications that could guide our 
> decision making. For general consumer topics (reviews, ratings, 
> photos, etc.) Schema.org as a unifying vocabulary made clear sense and 
> was guided by user-facing applications. As we touched on deeper 
> scientific topics where many levels of detail are potentially 
> applicable, the territory felt different.
>
> I spoke about this at the Elixir 
> <https://elixir-europe.org/events/elixir-all-hands-2016> 2016 All 
> Hands, and in particular emphasized that it could be counterproductive 
> to add this kind of vocabulary with the expectation of it primarily 
> being used in general web search engine product features. We didn't 
> want life-science site publishers to be disappointed if they added the 
> markup to their sites and did not subsequently feel they were 
> benefitting from having done so (e.g. in the Google case, by the 
> markup being used by one of the features in Google Search's list of 
> structured data features 
> <https://developers.google.com/search/docs/guides/search-gallery>). 
> And I didn't want to run into people at conferences a few years later 
> and be told "we added all this markup to our site and it hasn't done 
> us any good at at all!".
>
> Although these considerations apply to all schema.org additions, 
> Bioschemas was an effort to move Schema.org towards covering 
> scientific concepts and data structures in more detail than we had 
> approached before. Schema.org has always focussed on schemas that are 
> /used/, in the sense of consumed/interpreted by products, in 
> user-facing features and applications. Without this, it is difficult 
> to judge appropriate levels of detail, and it can be difficult for 
> publishers to justify the effort of adding the markup.
>
> The expectation originally was that the bioschemas project would work 
> equally on the data publishing, and the data-consumption side of 
> making these schemas part of a healthy ecosystem. I think what we've 
> seen is a lot more success on the former side than on the latter (and 
> that is no fault of any individual or group who has been part of the 
> bioschemas effort).
>
>
>   Pending
>
> By bringing these terms into schema.org's Pending area, schema.org 
> (per our standard documentation) sets the following expectations:
>
>     The Pending Section is a staging area for work-in-progress terms
>     which have yet to be accepted into the core vocabulary. Pending
>     terms are subject to change and should be used with caution.
>     Implementors and publishers are cautioned that terms in the
>     pending extension may lack consensus and that terminology and
>     definitions could still change significantly after community and
>     steering group review. Consumers of schema.org data who encourage
>     use of such terms are strongly encouraged to update
>     implementations and documentation to track any evolving changes,
>     and to share early implementation feedback with the wider community.
>
> This is loosely analogous to language W3C uses for Working Drafts 
> <https://www.w3.org/2020/Process-20200915/#RecsWD>, and I highlight it 
> here because it is important to acknowledge that the bioschemas 
> vocabulary has been the product of a significant and expert-informed 
> process over the last few years, and in particular it has been 
> created, amended and developed in collaboration with many 
> authoritative publishers of bioinformatics / lifesciences data.
>
> It may be that the vocabulary in its schema.org incarnation will 
> evolve further, but readers arriving here without knowledge of its 
> origins should know that there have been substantial and long-running, 
> expert-led collaborations <https://bioschemas.org/meetings/> leading 
> to these designs.
>
> Our challenge now will be to address any technical and usability 
> integration issues between these schemas and the rest of Schema.org, 
> and to move the focus towards data-consuming applications, so that we 
> can understand whether the level of detail, definitions, properties 
> proposed here are sufficient to meet the needs of user-facing 
> applications.
>
> The Bioschemas project provides some supporting tooling 
> <https://bioschemas.org/software/>, and there are other opensource 
> tools (e.g. Gleaner.io <https://gleaner.io/>, Schemarama 
> <https://github.com/google/schemarama> that may be helpful to those 
> developing applications.
>
>
>   Schema.org for Knowledge Graph Exchange
>
> As we look to support the use of schema.org data in new and 
> interesting areas, we should also take care to be open-minded about 
> what counts as "/using/" Schema.org in a data-consuming application.
>
> For example, at Google we made some investigations 
> <https://github.com/google/schemarama/tree/main/kgx/wikidata/bioschemas> 
> into whether Schema.org extended with Bioschemas is sufficiently 
> expressive to capture a useful "knowledge graph for lifesciences 
> <https://elifesciences.org/articles/52614>" subset extracted from 
> Wikidata.org. Would such a database be a user-facing use of the data, 
> or a workflow / infrastructural step towards an environment where 
> user-facing applications could eventually be created? It is a little 
> of both. While we can declare developers to be a kind of user we care 
> about, these kinds of generic application do not always provide 
> guidance that can help scope and shape schema design.
>
> Such "/knowledge graph exchange/" scenario for using Schema.org-based 
> data are part of a larger trend. For example:
>
> ·Yago <https://yago-knowledge.org/>, which converts Wikidata to use 
> Schema.org vocabulary.
>
> ·Ozymandias 
> <https://iphylo.blogspot.com/2018/08/ozymandias-biodiversity-knowledge-graph.html>, 
> "a biodiversity knowledge graph of Australian taxa and taxonomic 
> publications".
>
> ·Springer Nature's SciGraph 
> <https://researchdata.springernature.com/posts/45943-sn-scigraph-latest-release-patents-clinical-trials-and-many-new-features>, 
> /"collates information from across the research landscape, i.e. the 
> things, documents, people, places and relations of importance to the 
> science and scholarly domain."/
>
> ·DataCommons.org <https://datacommons.org/>, /"Datacommons.org is an 
> open knowledge repository hosted by Google that provides a unified 
> view across multiple public datasets, combining economic, scientific 
> and other open datasets into an integrated data graph."/ (wikipedia 
> <https://en.wikipedia.org/wiki/Datacommons.org>, github 
> <https://github.com/datacommonsorg/>).
>
> I believe we should as a project explicitly declare these kinds of 
> open data sharing, "knowledge graph exchange" initiatives as being 
> amongst the kinds of data-consuming application that justify additions 
> and changes to Schema.org. They are very much in the spirit of the 
> project, but some thought is needed on how to operationalize this.
>
> This doesn't mean that just spinning up an RDF database with some test 
> data in would be sufficient; rather that we would be acknowledging 
> data scientists, developers and others who work with data as being 
> important user constituencies. Just as schema.org serves non-technical 
> search engine end-users who are looking for jobs, recipes, reviews, 
> events, datasets or fact checks on the various search engines, it can 
> also support developers and data scientists who work with aggregations 
> of schema.org data. As the DataCommons.org site says,
>
>     We cleaned and processed the data so you don't have to. Data about
>     particular entities are aggregated from different sources for a
>     unified view.
>
> This kind of service (provided also by Wikidata et al.) can add huge 
> value and help others meet the needs of /their/ users.
>
> The clarification to be made here is that our exit criteria for moving 
> terms out of "Pending" status into the Schema.org core vocabulary 
> should consider public, opendata knowledge graph use (SPARQL/RDF, 
> Property Graphs, etc.) as important evidence towards demonstrating the 
> usefulness of schema.org schema designs.
>
> To @stain <https://github.com/stain>'s point, it is true that we have 
> been a little blocked at Schema.org in terms of knowing how to handle 
> the Bioschemas proposals, since they do make significant amounts of 
> great data accessible via schema.org markup, even if the 
> data-consuming applications we collectively anticipated back in 2016 
> have yet to emerge.
>
> Schema.org in the past has suffered from "build it and they'll come" 
> optimism, and contains a number of schema designs which lack 
> substantive data-consuming implementations. This is why we introduced 
> the notion of "pending 
> <https://schema.org/docs/howwework.html#pending>", so that there is an 
> opportunity to surface potentially valuable schema designs, while also 
> flagging up that we believe there may be possible tweaks ahead as 
> data-consuming implementations surface.
>
> If we clarify "user-facing, data-consuming application" to include 
> open data-sharing "knowledge graph" systems like Wikidata, Yago, SN 
> SciGraph, Ozymandius, Data Commons, I believe this opens up a roadmap 
> for bringing Bioschemas (and similar proposals) into Schema.org, 
> without setting unrealistic expectations about the schema details 
> being used. In particular it gives us a new focal point for 
> articulating questions about the user needs being met by schema 
> designs; we can ask about the kinds of queries supported by the 
> combination of these schemas with opendata that uses the schemas.
>
> Framed in this way I'm a lot more comfortable bringing these schemas 
> into Pending, as it gives a plausible path for progressing things 
> further. @AlasdairGray <https://github.com/AlasdairGray> et al., does 
> that work for you?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub 
> <https://github.com/schemaorg/schemaorg/pull/2711#issuecomment-811929818>, 
> or unsubscribe 
> <https://github.com/notifications/unsubscribe-auth/AAIWUEOCEGGYCIR467YTVXTTGR4GDANCNFSM4RQHGDSQ>.Image 
> removed by sender.
>
> Untitled Document
> ------------------------------------------------------------------------
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With 
> campuses and students across the entire globe we span the world, 
> delivering innovation and educational excellence in business, 
> engineering, design and the physical, social and life sciences. This 
> email is generated from the Heriot-Watt University Group, which includes:
>
>  1. Heriot-Watt University, a Scottish charity registered under number
>     SC000278
>  2. Heriot- Watt Services Limited (Oriam), Scotland's national
>     performance centre for sport. Heriot-Watt Services Limited is a
>     private limited company registered is Scotland with registered
>     number SC271030 and registered office at Research & Enterprise
>     Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>
> The contents (including any attachments) are confidential. If you are 
> not the intended recipient of this e-mail, any disclosure, copying, 
> distribution or use of its contents is strictly prohibited, and you 
> should please notify the sender immediately and then delete it 
> (including any attachments) from your system.
>
Received on Friday, 2 April 2021 07:37:05 UTC