W3C home > Mailing lists > Public > public-bioschemas@w3.org > October 2020

Re: Dataset Search Markup for COVID-19 Portal

From: Amonida Zadissa <amonida@ebi.ac.uk>
Date: Mon, 19 Oct 2020 00:18:59 +0200
Cc: amonida@ebi.ac.uk, "public-bioschemas@w3.org" <public-bioschemas@w3.org>, Guy Cochrane <cochrane@ebi.ac.uk>, Rodrigo Lopez <rls@ebi.ac.uk>
To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
Message-Id: <20201018221902.2821A62CF0E_F8CBF56B@hh-mx3.ebi.ac.uk>
Dear Alasdair

Thanks for your patience and the detailed information you sent.

I just wanted to let you know that the technical team working on the 
Portal have started integrating (Bio)Schema.org schema in the portal. 
We're hoping to see this implementation happening in the upcoming 
updates of the portal soon. I'm CC'ing Rodrigo on this thread who will 
be able to provide you with more details if required.

Hopefully, with this modification we'll make the Portal even more 
accessible. I will let you know when the transition has happened as it 
would be helpful to have your input then.

Thank you again for your recommendations.

Best regards,
Amonida

On 29/09/2020 12:37, Gray, Alasdair J G wrote:
>
> Hi Amonida,
>
> Thanks for joining the community call yesterday. I’m following up on 
> the discussion about the inclusion of Dataset markup in the COVID-19 
> Data Portal and its ability to be found on the web. I think there were 
> some crosswires during the discussion yesterday so hopefully this 
> email will clarify some of the issues.
>
> At the moment, at least for me, the COVID-19 Data Portal can be found 
> by a search on Google with the terms ‘covid-19 data portal’. Such a 
> search term assumes that someone knows about the portal. If you do not 
> include the term ‘portal’ then it does not appear in the first page of 
> results; I didn’t check beyond that. However, there is a dedicated 
> Google search tool for datasets [1], and I cannot find the COVID-19 
> portal there at all.
>
> There is an argument that the portal should not be discoverable 
> through the dataset search since it is a portal and not a dataset. As 
> you said on the call, you surface data from relevant underlying data 
> sources, and therefore it is the responsibility of these data sources 
> to make schema.org Dataset markup available. (As you will see below 
> this is the case with only one of your sources). However, other data 
> portals/registries do appear in the Google dataset search such as 
> FAIRsharing, openaire, and figshare as shown by this search for a 
> ‘Nucleotide Archive’ [2].
>
> The advantage of having the COVID-19 Data Portal also appear is to 
> make the data more discoverable which I believe is part of the aim of 
> the portal. To achieve this, Dataset and DataCatalog markup should be 
> added to the homepage of the COVID-19 data portal to describe what the 
> portal is and what data it facilitates the discovery of.
>
> I have drafted a first version of this markup on the Bioschemas 
> repository [3]. Note that I only give very minimal information about 
> the datasets, assuming instead that each of these is providing their 
> own markup and that we are linking to that. Such markup would need to 
> be added to ENA, PDBe, EMDB, Expression Atlas, and Europe PMC. This 
> would make all of these resources more discoverable through Google as 
> this is the markup that they rely on for their dataset search tools.
>
> I have run my first draft of the markup through the Google Structured 
> Data Testing Tool [4]. All the errors and warnings are due to the 
> minimal (linked) nature of the markup that I have used.
>
> If you have further questions, please do not hesitate to ask.
>
> Best regards
>
> Alasdair
>
> 1. https://datasetsearch.research.google.com/ 
> <https://datasetsearch.research.google.com/>
>
> 2. 
> https://datasetsearch.research.google.com/search?query=nucleotide%20archive&docid=U0qm7IWj%2BWZKy8EFAAAAAA%3D%3D 
> <https://datasetsearch.research.google.com/search?query=nucleotide%20archive&docid=U0qm7IWj%2BWZKy8EFAAAAAA%3D%3D>
>
> 3. 
> https://github.com/BioSchemas/specifications/blob/master/DataCatalog/examples/0.3/COVID-19DataPortal.json 
> <https://github.com/BioSchemas/specifications/blob/master/DataCatalog/examples/0.3/COVID-19DataPortal.json>
>
> 4. 
> https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fraw.githubusercontent.com%2FBioSchemas%2Fspecifications%2Fmaster%2FDataCatalog%2Fexamples%2F0.3%2FCOVID-19DataPortal.json
>
> -- 
>
> Alasdair J G Gray
>
> Associate Professor in Computer Science,
> School of Mathematical and Computer Sciences
> Heriot-Watt University, Edinburgh, UK.
>
> Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
> Web: http://www.macs.hw.ac.uk/~ajg33 <http://www.macs.hw.ac.uk/~ajg33>
> ORCID: http://orcid.org/0000-0002-5711-4872 
> <http://orcid.org/0000-0002-5711-4872>
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
>
> Heriot-Watt is a global University, as a result my working hours may 
> not be your working hours. Do not feel pressure to reply to this email 
> outside your working hours.
>
> To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time
>
> Untitled Document
> ------------------------------------------------------------------------
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With 
> campuses and students across the entire globe we span the world, 
> delivering innovation and educational excellence in business, 
> engineering, design and the physical, social and life sciences. This 
> email is generated from the Heriot-Watt University Group, which includes:
>
>  1. Heriot-Watt University, a Scottish charity registered under number
>     SC000278
>  2. Heriot- Watt Services Limited (Oriam), Scotland's national
>     performance centre for sport. Heriot-Watt Services Limited is a
>     private limited company registered is Scotland with registered
>     number SC271030 and registered office at Research & Enterprise
>     Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>
> The contents (including any attachments) are confidential. If you are 
> not the intended recipient of this e-mail, any disclosure, copying, 
> distribution or use of its contents is strictly prohibited, and you 
> should please notify the sender immediately and then delete it 
> (including any attachments) from your system.
>
Received on Monday, 19 October 2020 10:07:45 UTC

This archive was generated by hypermail 2.4.0 : Monday, 19 October 2020 10:07:46 UTC