Re: FW: [schemaorg/schemaorg] Update to: Core Types to Support the Discovery of Life Sciences Resources (#2711) from Dan Brickley on 2021-04-02 (public-bioschemas@w3.org from April 2021)

From: Dan Brickley <danbri@danbri.org>
Date: Fri, 2 Apr 2021 09:12:18 +0100
To: Franck Michel <franck.michel@cnrs.fr>
Cc: "Gray, Alasdair" <A.J.G.Gray@hw.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
Message-ID: <CAFfrAFq5Xi9iAO8wq-=07qos5BatQFRWSkwurZjsPAGx8ERpqQ@mail.gmail.com>
The more interesting varied and impactful the applications the better, but
we should strive to be able to point to at least one thing the markup is
being useful for, and aggregated open data as KG seems a very strong
candidate.

Looking at applications can help guide vocab improvements too, which might
help in planning any follow up activities?

On Fri, 2 Apr 2021 at 08:37, Franck Michel <franck.michel@cnrs.fr> wrote:

> Hi Alasdair and all,
>
> Thanks for the information. This is great news indeed.
>
> I very much appreciate Dan's focus on considering the building of
> knowledge graphs as an important part of what they consider
> markup-consuming applications, since consumption evidence is an obvious
> requirement for Schema.org to eventually endorse terms.
>
> Good to see that things go forward!
>
>
> Franck.
>
> Le 01/04/2021 à 16:19, Gray, Alasdair a écrit :
>
> Hi All,
>
>
>
> Please see the below for details of the response from Dan on our request
> to merge our first collection of types into Schema.org.
>
>
>
> I think this is a hugely positive step forward. Hopefully the inclusion
> into pending will entice more people to both deploy but also to build
> applications that rely on our proposed types.
>
>
>
> Best regards
>
>
>
> Alasdair
>
>
>
> --
>
> Alasdair J G Gray
>
> Associate Professor in Computer Science,
> School of Mathematical and Computer Sciences
> Heriot-Watt University, Edinburgh, UK.
>
> Email: A.J.G.Gray@hw.ac.uk <A.J.G.Gray@hw.ac.uk>
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/0000-0002-5711-4872
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
>
>
>
>
>
> Heriot-Watt is a global University, as a result my working hours may not
> be your working hours. Do not feel pressure to reply to this email outside
> your working hours.
>
>
>
>
>
> To arrange a meeting:
> https://outlook.office365.com/owa/calendar/AlasdairGray@heriotwatt.onmicrosoft.com/bookings/
>
>
>
> *From: *"notifications@github.com" <notifications@github.com>
> <notifications@github.com> <notifications@github.com>
> *Reply to: *schemaorg/schemaorg
> <reply+AAIWUENVPLBNZFYILBZRFFF6OGZWDEVBNHHCT3QWYQ@reply.github.com>
> <reply+AAIWUENVPLBNZFYILBZRFFF6OGZWDEVBNHHCT3QWYQ@reply.github.com>
> *Date: *Thursday, 1 April 2021 at 15:02
> *To: *schemaorg/schemaorg <schemaorg@noreply.github.com>
> <schemaorg@noreply.github.com>
> *Cc: *Alasdair Gray <A.J.G.Gray@hw.ac.uk> <A.J.G.Gray@hw.ac.uk>,
> "mention@noreply.github.com" <mention@noreply.github.com>
> <mention@noreply.github.com> <mention@noreply.github.com>
> *Subject: *Re: [schemaorg/schemaorg] Update to: Core Types to Support the
> Discovery of Life Sciences Resources (#2711)
>
>
>
>
> ***************************************************************** *
> *Caution: This email originated from a sender outside Heriot-Watt
> University. Do not follow links or open attachments if you doubt the
> authenticity of the sender or the content. *
> * *****************************************************************
>
>
>
> *Short version:* My sense is that we should get this into Pending, with a
> view towards them becoming part of core schema.org as evidence of
> data-consuming applications is collected. Based on the experience of the
> last few years, we should also expand our notion of "data-consuming
> applications" to cover developer and datascientist -facing applications,
> such as public open data knowledge graphs. I believe the bioschemas schemas
> have great potential, but we have work to do yet to determine quite what
> level of detail is going to prove appropriate for this kind of vocabulary.
>
> Next steps: I've asked @RichardWallis <https://github.com/RichardWallis>
> to take a look at some minor fixes to the PR, to mark these terms as part
> of the Pending area of schema.org, and remove any conflicts (e.g.
> SchemaExamples/schemaexamples.py needs removing).
> Status and Context and expectation setting
>
> When the Bioschemas activity was first suggested we (Schema.org leads)
> were initially wary of bringing Schema.org into an area where there were a
> great number of existing scientific and research data ontologies, unless
> there was a serious prospect of the schemas being used in substantive
> user-benefitting applications that could guide our decision making. For
> general consumer topics (reviews, ratings, photos, etc.) Schema.org as a
> unifying vocabulary made clear sense and was guided by user-facing
> applications. As we touched on deeper scientific topics where many levels
> of detail are potentially applicable, the territory felt different.
>
> I spoke about this at the Elixir
> <https://elixir-europe.org/events/elixir-all-hands-2016> 2016 All Hands,
> and in particular emphasized that it could be counterproductive to add this
> kind of vocabulary with the expectation of it primarily being used in
> general web search engine product features. We didn't want life-science
> site publishers to be disappointed if they added the markup to their sites
> and did not subsequently feel they were benefitting from having done so
> (e.g. in the Google case, by the markup being used by one of the features
> in Google Search's list of structured data features
> <https://developers.google.com/search/docs/guides/search-gallery>). And I
> didn't want to run into people at conferences a few years later and be told
> "we added all this markup to our site and it hasn't done us any good at at
> all!".
>
> Although these considerations apply to all schema.org additions,
> Bioschemas was an effort to move Schema.org towards covering scientific
> concepts and data structures in more detail than we had approached before.
> Schema.org has always focussed on schemas that are *used*, in the sense
> of consumed/interpreted by products, in user-facing features and
> applications. Without this, it is difficult to judge appropriate levels of
> detail, and it can be difficult for publishers to justify the effort of
> adding the markup.
>
> The expectation originally was that the bioschemas project would work
> equally on the data publishing, and the data-consumption side of making
> these schemas part of a healthy ecosystem. I think what we've seen is a lot
> more success on the former side than on the latter (and that is no fault of
> any individual or group who has been part of the bioschemas effort).
> Pending
>
> By bringing these terms into schema.org's Pending area, schema.org (per
> our standard documentation) sets the following expectations:
>
> The Pending Section is a staging area for work-in-progress terms which
> have yet to be accepted into the core vocabulary. Pending terms are subject
> to change and should be used with caution.
> Implementors and publishers are cautioned that terms in the pending
> extension may lack consensus and that terminology and definitions could
> still change significantly after community and steering group review.
> Consumers of schema.org data who encourage use of such terms are strongly
> encouraged to update implementations and documentation to track any
> evolving changes, and to share early implementation feedback with the wider
> community.
>
> This is loosely analogous to language W3C uses for Working Drafts
> <https://www.w3.org/2020/Process-20200915/#RecsWD>, and I highlight it
> here because it is important to acknowledge that the bioschemas vocabulary
> has been the product of a significant and expert-informed process over the
> last few years, and in particular it has been created, amended and
> developed in collaboration with many authoritative publishers of
> bioinformatics / lifesciences data.
>
> It may be that the vocabulary in its schema.org incarnation will evolve
> further, but readers arriving here without knowledge of its origins should
> know that there have been substantial and long-running, expert-led
> collaborations <https://bioschemas.org/meetings/> leading to these
> designs.
>
> Our challenge now will be to address any technical and usability
> integration issues between these schemas and the rest of Schema.org, and to
> move the focus towards data-consuming applications, so that we can
> understand whether the level of detail, definitions, properties proposed
> here are sufficient to meet the needs of user-facing applications.
>
> The Bioschemas project provides some supporting tooling
> <https://bioschemas.org/software/>, and there are other opensource tools
> (e.g. Gleaner.io <https://gleaner.io/>, Schemarama
> <https://github.com/google/schemarama> that may be helpful to those
> developing applications.
> Schema.org for Knowledge Graph Exchange
>
> As we look to support the use of schema.org data in new and interesting
> areas, we should also take care to be open-minded about what counts as "
> *using*" Schema.org in a data-consuming application.
>
> For example, at Google we made some investigations
> <https://github.com/google/schemarama/tree/main/kgx/wikidata/bioschemas>
> into whether Schema.org extended with Bioschemas is sufficiently expressive
> to capture a useful "knowledge graph for lifesciences
> <https://elifesciences.org/articles/52614>" subset extracted from
> Wikidata.org. Would such a database be a user-facing use of the data, or a
> workflow / infrastructural step towards an environment where user-facing
> applications could eventually be created? It is a little of both. While we
> can declare developers to be a kind of user we care about, these kinds of
> generic application do not always provide guidance that can help scope and
> shape schema design.
>
> Such "*knowledge graph exchange*" scenario for using Schema.org-based
> data are part of a larger trend. For example:
>
> ·         Yago <https://yago-knowledge.org/>, which converts Wikidata to
> use Schema.org vocabulary.
>
> ·         Ozymandias
> <https://iphylo.blogspot.com/2018/08/ozymandias-biodiversity-knowledge-graph.html>,
> "a biodiversity knowledge graph of Australian taxa and taxonomic
> publications".
>
> ·         Springer Nature's SciGraph
> <https://researchdata.springernature.com/posts/45943-sn-scigraph-latest-release-patents-clinical-trials-and-many-new-features>,
> *"collates information from across the research landscape, i.e. the
> things, documents, people, places and relations of importance to the
> science and scholarly domain."*
>
> ·         DataCommons.org <https://datacommons.org/>, *"Datacommons.org
> is an open knowledge repository hosted by Google that provides a unified
> view across multiple public datasets, combining economic, scientific and
> other open datasets into an integrated data graph."* (wikipedia
> <https://en.wikipedia.org/wiki/Datacommons.org>, github
> <https://github.com/datacommonsorg/>).
>
> I believe we should as a project explicitly declare these kinds of open
> data sharing, "knowledge graph exchange" initiatives as being amongst the
> kinds of data-consuming application that justify additions and changes to
> Schema.org. They are very much in the spirit of the project, but some
> thought is needed on how to operationalize this.
>
> This doesn't mean that just spinning up an RDF database with some test
> data in would be sufficient; rather that we would be acknowledging data
> scientists, developers and others who work with data as being important
> user constituencies. Just as schema.org serves non-technical search
> engine end-users who are looking for jobs, recipes, reviews, events,
> datasets or fact checks on the various search engines, it can also support
> developers and data scientists who work with aggregations of schema.org
> data. As the DataCommons.org site says,
>
> We cleaned and processed the data so you don't have to. Data about
> particular entities are aggregated from different sources for a unified
> view.
>
> This kind of service (provided also by Wikidata et al.) can add huge value
> and help others meet the needs of *their* users.
>
> The clarification to be made here is that our exit criteria for moving
> terms out of "Pending" status into the Schema.org core vocabulary should
> consider public, opendata knowledge graph use (SPARQL/RDF, Property Graphs,
> etc.) as important evidence towards demonstrating the usefulness of
> schema.org schema designs.
>
> To @stain <https://github.com/stain>'s point, it is true that we have
> been a little blocked at Schema.org in terms of knowing how to handle the
> Bioschemas proposals, since they do make significant amounts of great data
> accessible via schema.org markup, even if the data-consuming applications
> we collectively anticipated back in 2016 have yet to emerge.
>
> Schema.org in the past has suffered from "build it and they'll come"
> optimism, and contains a number of schema designs which lack substantive
> data-consuming implementations. This is why we introduced the notion of "
> pending <https://schema.org/docs/howwework.html#pending>", so that there
> is an opportunity to surface potentially valuable schema designs, while
> also flagging up that we believe there may be possible tweaks ahead as
> data-consuming implementations surface.
>
> If we clarify "user-facing, data-consuming application" to include open
> data-sharing "knowledge graph" systems like Wikidata, Yago, SN SciGraph,
> Ozymandius, Data Commons, I believe this opens up a roadmap for bringing
> Bioschemas (and similar proposals) into Schema.org, without setting
> unrealistic expectations about the schema details being used. In particular
> it gives us a new focal point for articulating questions about the user
> needs being met by schema designs; we can ask about the kinds of queries
> supported by the combination of these schemas with opendata that uses the
> schemas.
>
> Framed in this way I'm a lot more comfortable bringing these schemas into
> Pending, as it gives a plausible path for progressing things further.
> @AlasdairGray <https://github.com/AlasdairGray> et al., does that work
> for you?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/schemaorg/schemaorg/pull/2711#issuecomment-811929818>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAIWUEOCEGGYCIR467YTVXTTGR4GDANCNFSM4RQHGDSQ>
> .[image: Image removed by sender.]
> ------------------------------
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> the physical, social and life sciences. This email is generated from the
> Heriot-Watt University Group, which includes:
>
>    1. Heriot-Watt University, a Scottish charity registered under number
>    SC000278
>    2. Heriot- Watt Services Limited (Oriam), Scotland's national
>    performance centre for sport. Heriot-Watt Services Limited is a private
>    limited company registered is Scotland with registered number SC271030 and
>    registered office at Research & Enterprise Services Heriot-Watt University,
>    Riccarton, Edinburgh, EH14 4AS.
>
> The contents (including any attachments) are confidential. If you are not
> the intended recipient of this e-mail, any disclosure, copying,
> distribution or use of its contents is strictly prohibited, and you should
> please notify the sender immediately and then delete it (including any
> attachments) from your system.
>
>
>
Received on Friday, 2 April 2021 08:12:45 UTC