Re: Dataset Search Markup for COVID-19 Portal

Dear Amonida,

Great to see the DataCatalog and Dataset markup has been added. This should allow the portal to be indexed by Google and added to the Dataset Search. I’ve created a pull request to add these to our list of live deployments on the Bioschemas website
https://github.com/BioSchemas/bioschemas.github.io/pull/337


One comment is that this markup should probably only be on the landing page. On page related to specific types, e.g. the protein page, you could use the CollectionPage type and list the resources on the page with the appropriate Bioschemas type, this would make the information about the individual genes/proteins/etc reusable in a machine processable way. A suggestion for this markup can be found at the following link
https://github.com/BioSchemas/specifications/blob/master/Protein/examples/0.11-RELEASE/covid-19DataPortal.html


Best regards

Alasdair


--
Alasdair J G Gray
Associate Professor in Computer Science,
School of Mathematical and Computer Sciences
Heriot-Watt University, Edinburgh, UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair


Heriot-Watt is a global University, as a result my working hours may not be your working hours. Do not feel pressure to reply to this email outside your working hours.


To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time



From: Amonida Zadissa <amonida@ebi.ac.uk>
Date: Wednesday, 28 October 2020 at 06:26
To: Alasdair Gray <A.J.G.Gray@hw.ac.uk>
Cc: "amonida@ebi.ac.uk" <amonida@ebi.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>, Guy Cochrane <cochrane@ebi.ac.uk>, "rls@ebi.ac.uk" <rls@ebi.ac.uk>
Subject: Re: Dataset Search Markup for COVID-19 Portal

****************************************************************
Caution: This email originated from a sender outside Heriot-Watt University.
Do not follow links or open attachments if you doubt the authenticity of the sender or the content.
****************************************************************


Dear Alasdair and all

I would like to announce that the COVID-19 Data Portal is now using Bioschema.org. This integration became part of the larger 6th month anniversary of the Data Portal which saw many updates on the Portal.

If you have any comments, please let us know.

Best regards,
Amonida
On 19/10/2020 00:18, Amonida Zadissa wrote:

Dear Alasdair

Thanks for your patience and the detailed information you sent.

I just wanted to let you know that the technical team working on the Portal have started integrating (Bio)Schema.org schema in the portal. We're hoping to see this implementation happening in the upcoming updates of the portal soon. I'm CC'ing Rodrigo on this thread who will be able to provide you with more details if required.

Hopefully, with this modification we'll make the Portal even more accessible. I will let you know when the transition has happened as it would be helpful to have your input then.

Thank you again for your recommendations.

Best regards,
Amonida
On 29/09/2020 12:37, Gray, Alasdair J G wrote:
Hi Amonida,

Thanks for joining the community call yesterday. I’m following up on the discussion about the inclusion of Dataset markup in the COVID-19 Data Portal and its ability to be found on the web. I think there were some crosswires during the discussion yesterday so hopefully this email will clarify some of the issues.

At the moment, at least for me, the COVID-19 Data Portal can be found by a search on Google with the terms ‘covid-19 data portal’. Such a search term assumes that someone knows about the portal. If you do not include the term ‘portal’ then it does not appear in the first page of results; I didn’t check beyond that. However, there is a dedicated Google search tool for datasets [1], and I cannot find the COVID-19 portal there at all.

There is an argument that the portal should not be discoverable through the dataset search since it is a portal and not a dataset. As you said on the call, you surface data from relevant underlying data sources, and therefore it is the responsibility of these data sources to make schema.org Dataset markup available. (As you will see below this is the case with only one of your sources). However, other data portals/registries do appear in the Google dataset search such as FAIRsharing, openaire, and figshare as shown by this search for a ‘Nucleotide Archive’ [2].

The advantage of having the COVID-19 Data Portal also appear is to make the data more discoverable which I believe is part of the aim of the portal. To achieve this, Dataset and DataCatalog markup should be added to the homepage of the COVID-19 data portal to describe what the portal is and what data it facilitates the discovery of.

I have drafted a first version of this markup on the Bioschemas repository [3]. Note that I only give very minimal information about the datasets, assuming instead that each of these is providing their own markup and that we are linking to that. Such markup would need to be added to ENA, PDBe, EMDB, Expression Atlas, and Europe PMC. This would make all of these resources more discoverable through Google as this is the markup that they rely on for their dataset search tools.

I have run my first draft of the markup through the Google Structured Data Testing Tool [4]. All the errors and warnings are due to the minimal (linked) nature of the markup that I have used.

If you have further questions, please do not hesitate to ask.

Best regards

Alasdair

1. https://datasetsearch.research.google.com/

2. https://datasetsearch.research.google.com/search?query=nucleotide%20archive&docid=U0qm7IWj%2BWZKy8EFAAAAAA%3D%3D

3. https://github.com/BioSchemas/specifications/blob/master/DataCatalog/examples/0.3/COVID-19DataPortal.json

4. https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fraw.githubusercontent.com%2FBioSchemas%2Fspecifications%2Fmaster%2FDataCatalog%2Fexamples%2F0.3%2FCOVID-19DataPortal.json


--
Alasdair J G Gray
Associate Professor in Computer Science,
School of Mathematical and Computer Sciences
Heriot-Watt University, Edinburgh, UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair


Heriot-Watt is a global University, as a result my working hours may not be your working hours. Do not feel pressure to reply to this email outside your working hours.


To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time


________________________________

Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. This email is generated from the Heriot-Watt University Group, which includes:
1.     Heriot-Watt University, a Scottish charity registered under number SC000278
2.     Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.

--

Dr Amonida Zadissa (She/Her)

Senior Strategy Officer



EMBL-EBI

Wellcome Genome Campus

Hinxton

Cambridgeshire

CB10 1SD

UK
-->

Received on Wednesday, 28 October 2020 17:07:07 UTC