- From: Franck Michel <franck.michel@cnrs.fr>
- Date: Fri, 2 Apr 2021 09:36:49 +0200
- To: "Gray, Alasdair" <A.J.G.Gray@hw.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
- Message-ID: <af870430-6309-b202-48a7-04cfe5f5f73c@cnrs.fr>
Hi Alasdair and all, Thanks for the information. This is great news indeed. I very much appreciate Dan's focus on considering the building of knowledge graphs as an important part of what they consider markup-consuming applications, since consumption evidence is an obvious requirement for Schema.org to eventually endorse terms. Good to see that things go forward! Franck. Le 01/04/2021 à 16:19, Gray, Alasdair a écrit : > > Hi All, > > Please see the below for details of the response from Dan on our > request to merge our first collection of types into Schema.org. > > I think this is a hugely positive step forward. Hopefully the > inclusion into pending will entice more people to both deploy but also > to build applications that rely on our proposed types. > > Best regards > > Alasdair > > -- > > Alasdair J G Gray > > Associate Professor in Computer Science, > School of Mathematical and Computer Sciences > Heriot-Watt University, Edinburgh, UK. > > Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk> > Web: http://www.macs.hw.ac.uk/~ajg33 <http://www.macs.hw.ac.uk/~ajg33> > ORCID: http://orcid.org/0000-0002-5711-4872 > <http://orcid.org/0000-0002-5711-4872> > Office: Earl Mountbatten Building 1.39 > Twitter: @gray_alasdair > > Heriot-Watt is a global University, as a result my working hours may > not be your working hours. Do not feel pressure to reply to this email > outside your working hours. > > To arrange a > meeting: https://outlook.office365.com/owa/calendar/AlasdairGray@heriotwatt.onmicrosoft.com/bookings/ > > *From: *"notifications@github.com" <notifications@github.com> > *Reply to: *schemaorg/schemaorg > <reply+AAIWUENVPLBNZFYILBZRFFF6OGZWDEVBNHHCT3QWYQ@reply.github.com> > *Date: *Thursday, 1 April 2021 at 15:02 > *To: *schemaorg/schemaorg <schemaorg@noreply.github.com> > *Cc: *Alasdair Gray <A.J.G.Gray@hw.ac.uk>, > "mention@noreply.github.com" <mention@noreply.github.com> > *Subject: *Re: [schemaorg/schemaorg] Update to: Core Types to Support > the Discovery of Life Sciences Resources (#2711) > > ***************************************************************** > **Caution: This email originated from a sender outside Heriot-Watt > University. > Do not follow links or open attachments if you doubt the authenticity > of the sender or the content. ** > ******************************************************************* > > *Short version:* My sense is that we should get this into Pending, > with a view towards them becoming part of core schema.org as evidence > of data-consuming applications is collected. Based on the experience > of the last few years, we should also expand our notion of > "data-consuming applications" to cover developer and datascientist > -facing applications, such as public open data knowledge graphs. I > believe the bioschemas schemas have great potential, but we have work > to do yet to determine quite what level of detail is going to prove > appropriate for this kind of vocabulary. > > Next steps: I've asked @RichardWallis > <https://github.com/RichardWallis> to take a look at some minor fixes > to the PR, to mark these terms as part of the Pending area of > schema.org, and remove any conflicts (e.g. > SchemaExamples/schemaexamples.py needs removing). > > > Status and Context and expectation setting > > When the Bioschemas activity was first suggested we (Schema.org leads) > were initially wary of bringing Schema.org into an area where there > were a great number of existing scientific and research data > ontologies, unless there was a serious prospect of the schemas being > used in substantive user-benefitting applications that could guide our > decision making. For general consumer topics (reviews, ratings, > photos, etc.) Schema.org as a unifying vocabulary made clear sense and > was guided by user-facing applications. As we touched on deeper > scientific topics where many levels of detail are potentially > applicable, the territory felt different. > > I spoke about this at the Elixir > <https://elixir-europe.org/events/elixir-all-hands-2016> 2016 All > Hands, and in particular emphasized that it could be counterproductive > to add this kind of vocabulary with the expectation of it primarily > being used in general web search engine product features. We didn't > want life-science site publishers to be disappointed if they added the > markup to their sites and did not subsequently feel they were > benefitting from having done so (e.g. in the Google case, by the > markup being used by one of the features in Google Search's list of > structured data features > <https://developers.google.com/search/docs/guides/search-gallery>). > And I didn't want to run into people at conferences a few years later > and be told "we added all this markup to our site and it hasn't done > us any good at at all!". > > Although these considerations apply to all schema.org additions, > Bioschemas was an effort to move Schema.org towards covering > scientific concepts and data structures in more detail than we had > approached before. Schema.org has always focussed on schemas that are > /used/, in the sense of consumed/interpreted by products, in > user-facing features and applications. Without this, it is difficult > to judge appropriate levels of detail, and it can be difficult for > publishers to justify the effort of adding the markup. > > The expectation originally was that the bioschemas project would work > equally on the data publishing, and the data-consumption side of > making these schemas part of a healthy ecosystem. I think what we've > seen is a lot more success on the former side than on the latter (and > that is no fault of any individual or group who has been part of the > bioschemas effort). > > > Pending > > By bringing these terms into schema.org's Pending area, schema.org > (per our standard documentation) sets the following expectations: > > The Pending Section is a staging area for work-in-progress terms > which have yet to be accepted into the core vocabulary. Pending > terms are subject to change and should be used with caution. > Implementors and publishers are cautioned that terms in the > pending extension may lack consensus and that terminology and > definitions could still change significantly after community and > steering group review. Consumers of schema.org data who encourage > use of such terms are strongly encouraged to update > implementations and documentation to track any evolving changes, > and to share early implementation feedback with the wider community. > > This is loosely analogous to language W3C uses for Working Drafts > <https://www.w3.org/2020/Process-20200915/#RecsWD>, and I highlight it > here because it is important to acknowledge that the bioschemas > vocabulary has been the product of a significant and expert-informed > process over the last few years, and in particular it has been > created, amended and developed in collaboration with many > authoritative publishers of bioinformatics / lifesciences data. > > It may be that the vocabulary in its schema.org incarnation will > evolve further, but readers arriving here without knowledge of its > origins should know that there have been substantial and long-running, > expert-led collaborations <https://bioschemas.org/meetings/> leading > to these designs. > > Our challenge now will be to address any technical and usability > integration issues between these schemas and the rest of Schema.org, > and to move the focus towards data-consuming applications, so that we > can understand whether the level of detail, definitions, properties > proposed here are sufficient to meet the needs of user-facing > applications. > > The Bioschemas project provides some supporting tooling > <https://bioschemas.org/software/>, and there are other opensource > tools (e.g. Gleaner.io <https://gleaner.io/>, Schemarama > <https://github.com/google/schemarama> that may be helpful to those > developing applications. > > > Schema.org for Knowledge Graph Exchange > > As we look to support the use of schema.org data in new and > interesting areas, we should also take care to be open-minded about > what counts as "/using/" Schema.org in a data-consuming application. > > For example, at Google we made some investigations > <https://github.com/google/schemarama/tree/main/kgx/wikidata/bioschemas> > into whether Schema.org extended with Bioschemas is sufficiently > expressive to capture a useful "knowledge graph for lifesciences > <https://elifesciences.org/articles/52614>" subset extracted from > Wikidata.org. Would such a database be a user-facing use of the data, > or a workflow / infrastructural step towards an environment where > user-facing applications could eventually be created? It is a little > of both. While we can declare developers to be a kind of user we care > about, these kinds of generic application do not always provide > guidance that can help scope and shape schema design. > > Such "/knowledge graph exchange/" scenario for using Schema.org-based > data are part of a larger trend. For example: > > ·Yago <https://yago-knowledge.org/>, which converts Wikidata to use > Schema.org vocabulary. > > ·Ozymandias > <https://iphylo.blogspot.com/2018/08/ozymandias-biodiversity-knowledge-graph.html>, > "a biodiversity knowledge graph of Australian taxa and taxonomic > publications". > > ·Springer Nature's SciGraph > <https://researchdata.springernature.com/posts/45943-sn-scigraph-latest-release-patents-clinical-trials-and-many-new-features>, > /"collates information from across the research landscape, i.e. the > things, documents, people, places and relations of importance to the > science and scholarly domain."/ > > ·DataCommons.org <https://datacommons.org/>, /"Datacommons.org is an > open knowledge repository hosted by Google that provides a unified > view across multiple public datasets, combining economic, scientific > and other open datasets into an integrated data graph."/ (wikipedia > <https://en.wikipedia.org/wiki/Datacommons.org>, github > <https://github.com/datacommonsorg/>). > > I believe we should as a project explicitly declare these kinds of > open data sharing, "knowledge graph exchange" initiatives as being > amongst the kinds of data-consuming application that justify additions > and changes to Schema.org. They are very much in the spirit of the > project, but some thought is needed on how to operationalize this. > > This doesn't mean that just spinning up an RDF database with some test > data in would be sufficient; rather that we would be acknowledging > data scientists, developers and others who work with data as being > important user constituencies. Just as schema.org serves non-technical > search engine end-users who are looking for jobs, recipes, reviews, > events, datasets or fact checks on the various search engines, it can > also support developers and data scientists who work with aggregations > of schema.org data. As the DataCommons.org site says, > > We cleaned and processed the data so you don't have to. Data about > particular entities are aggregated from different sources for a > unified view. > > This kind of service (provided also by Wikidata et al.) can add huge > value and help others meet the needs of /their/ users. > > The clarification to be made here is that our exit criteria for moving > terms out of "Pending" status into the Schema.org core vocabulary > should consider public, opendata knowledge graph use (SPARQL/RDF, > Property Graphs, etc.) as important evidence towards demonstrating the > usefulness of schema.org schema designs. > > To @stain <https://github.com/stain>'s point, it is true that we have > been a little blocked at Schema.org in terms of knowing how to handle > the Bioschemas proposals, since they do make significant amounts of > great data accessible via schema.org markup, even if the > data-consuming applications we collectively anticipated back in 2016 > have yet to emerge. > > Schema.org in the past has suffered from "build it and they'll come" > optimism, and contains a number of schema designs which lack > substantive data-consuming implementations. This is why we introduced > the notion of "pending > <https://schema.org/docs/howwework.html#pending>", so that there is an > opportunity to surface potentially valuable schema designs, while also > flagging up that we believe there may be possible tweaks ahead as > data-consuming implementations surface. > > If we clarify "user-facing, data-consuming application" to include > open data-sharing "knowledge graph" systems like Wikidata, Yago, SN > SciGraph, Ozymandius, Data Commons, I believe this opens up a roadmap > for bringing Bioschemas (and similar proposals) into Schema.org, > without setting unrealistic expectations about the schema details > being used. In particular it gives us a new focal point for > articulating questions about the user needs being met by schema > designs; we can ask about the kinds of queries supported by the > combination of these schemas with opendata that uses the schemas. > > Framed in this way I'm a lot more comfortable bringing these schemas > into Pending, as it gives a plausible path for progressing things > further. @AlasdairGray <https://github.com/AlasdairGray> et al., does > that work for you? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/schemaorg/schemaorg/pull/2711#issuecomment-811929818>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAIWUEOCEGGYCIR467YTVXTTGR4GDANCNFSM4RQHGDSQ>.Image > removed by sender. > > Untitled Document > ------------------------------------------------------------------------ > > Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With > campuses and students across the entire globe we span the world, > delivering innovation and educational excellence in business, > engineering, design and the physical, social and life sciences. This > email is generated from the Heriot-Watt University Group, which includes: > > 1. Heriot-Watt University, a Scottish charity registered under number > SC000278 > 2. Heriot- Watt Services Limited (Oriam), Scotland's national > performance centre for sport. Heriot-Watt Services Limited is a > private limited company registered is Scotland with registered > number SC271030 and registered office at Research & Enterprise > Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS. > > The contents (including any attachments) are confidential. If you are > not the intended recipient of this e-mail, any disclosure, copying, > distribution or use of its contents is strictly prohibited, and you > should please notify the sender immediately and then delete it > (including any attachments) from your system. >
Received on Friday, 2 April 2021 07:37:05 UTC