Dataset Search Markup for COVID-19 Portal

Hi Amonida,

Thanks for joining the community call yesterday. I’m following up on the discussion about the inclusion of Dataset markup in the COVID-19 Data Portal and its ability to be found on the web. I think there were some crosswires during the discussion yesterday so hopefully this email will clarify some of the issues.

At the moment, at least for me, the COVID-19 Data Portal can be found by a search on Google with the terms ‘covid-19 data portal’. Such a search term assumes that someone knows about the portal. If you do not include the term ‘portal’ then it does not appear in the first page of results; I didn’t check beyond that. However, there is a dedicated Google search tool for datasets [1], and I cannot find the COVID-19 portal there at all.

There is an argument that the portal should not be discoverable through the dataset search since it is a portal and not a dataset. As you said on the call, you surface data from relevant underlying data sources, and therefore it is the responsibility of these data sources to make schema.org Dataset markup available. (As you will see below this is the case with only one of your sources). However, other data portals/registries do appear in the Google dataset search such as FAIRsharing, openaire, and figshare as shown by this search for a ‘Nucleotide Archive’ [2].

The advantage of having the COVID-19 Data Portal also appear is to make the data more discoverable which I believe is part of the aim of the portal. To achieve this, Dataset and DataCatalog markup should be added to the homepage of the COVID-19 data portal to describe what the portal is and what data it facilitates the discovery of.

I have drafted a first version of this markup on the Bioschemas repository [3]. Note that I only give very minimal information about the datasets, assuming instead that each of these is providing their own markup and that we are linking to that. Such markup would need to be added to ENA, PDBe, EMDB, Expression Atlas, and Europe PMC. This would make all of these resources more discoverable through Google as this is the markup that they rely on for their dataset search tools.

I have run my first draft of the markup through the Google Structured Data Testing Tool [4]. All the errors and warnings are due to the minimal (linked) nature of the markup that I have used.

If you have further questions, please do not hesitate to ask.

Best regards

Alasdair

1. https://datasetsearch.research.google.com/

2. https://datasetsearch.research.google.com/search?query=nucleotide%20archive&docid=U0qm7IWj%2BWZKy8EFAAAAAA%3D%3D

3. https://github.com/BioSchemas/specifications/blob/master/DataCatalog/examples/0.3/COVID-19DataPortal.json

4. https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fraw.githubusercontent.com%2FBioSchemas%2Fspecifications%2Fmaster%2FDataCatalog%2Fexamples%2F0.3%2FCOVID-19DataPortal.json


--
Alasdair J G Gray
Associate Professor in Computer Science,
School of Mathematical and Computer Sciences
Heriot-Watt University, Edinburgh, UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair


Heriot-Watt is a global University, as a result my working hours may not be your working hours. Do not feel pressure to reply to this email outside your working hours.


To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time


________________________________

Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. This email is generated from the Heriot-Watt University Group, which includes:

  1.  Heriot-Watt University, a Scottish charity registered under number SC000278
  2.  Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.

Received on Tuesday, 29 September 2020 10:37:55 UTC