Re: Announcing dataset "CORD-19 Named Entities KG": an RDF dataset of named entities identified in the CORD-19 corpus from Sören Auer on 2020-04-02 (public-lod@w3.org from April 2020)

From: Sören Auer <auer@l3s.de>
Date: Thu, 2 Apr 2020 14:31:45 +0200
To: Franck Michel <franck.michel@cnrs.fr>, public-lod <public-lod@w3.org>, semantic-web@w3.org, rda-covid19@rda-groups.org
Message-ID: <70a70d85-21e7-edc1-e111-0dfb862ed413@l3s.de>

Dear Franck, all,

Thanks for this imitative!

Since many of us are working on COVID related things now, I wanted to
share my general impression with the community:

From my experience, purely automated named entity recognition and
relation extraction approaches do *NOT* reach sufficient precision and
recall for most real world use cases in scholarly knowledge discovery.

As a result of this, we should invest our efforts in approaches, which
have really the potential to provide concrete value to epidemiology and
virology research.

In order to do so, we should start with very concrete use cases and
research questions, which we aim to support answering. As a result, we
can then benchmark the extend to which our approaches provide concrete
value in reality.

As an example, one important research question for COVID is the R0
reproductive number estimates for SARS-nCoV-2019. We created a
systematic comparison of different studies aiming to answer this
question in our Open Research Knowledge Graph here:

https://www.orkg.org/orkg/comparison/R12251

We are now working on integrating some domain specific visualizations of
the R0 estimates and their confidence intervals:

https://vitalis-wiens.github.io/ChartVisTest/

It would be great, if we could work together on covering more COVID
research in the Open Research Knowledge Graph - more info can be also
found here:
https://projects.tib.eu/orkg/get-involved/

In particular, we will be happy to ingest semantic extraction results
for answering concrete research questions into the ORKG.

Best regards and stay healthy everyone,

Sören

On 02.04.2020 12:18, Franck Michel wrote:
> Dear colleagues,
> 
> In order to foster innovative work based on the cross-linking of
> COVID-19 literature with the Data Web, we (Wimmics team, Inria
> <https://team.inria.fr/wimmics/>) are in the process of generating an
> RDF dataset describing the named entities identified in the research
> papers of the CORD-19
> <https://pages.semanticscholar.org/coronavirus-research> corpus.
> 
> To identify and disambiguate the named entities, we are using NCBO
> BioPortal annotator <http://bioportal.bioontology.org/annotatorplus>,
> Entity-fishing <https://github.com/kermitt2/entity-fishing> (links to
> Wikidata) and DBpedia Spotlight <https://www.dbpedia-spotlight.org/>
> (links to DBpedia). We are also taking care of linking to other related
> works such as CORD-19-on-FHIR
> <https://github.com/fhircat/CORD-19-on-FHIR> and COVID-19 Literature KG
> <https://www.kaggle.com/group16/covid19-literature-knowledge-graph>.
> 
> We shall release this dataset soon, as n RDF dump as well as through a
> dedicated SPARQL endpoint. Stay tuned!
> 
> Regards,
>     Franck.
> -- 
> signature
>  
> 
>  Franck MICHEL - CNRS research engineer
> Université Côte d’Azur, CNRS, Inria
> I3S laboratory (UMR 7271)
> franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr> - +33 (0)4 8915 4277  
>  
>

Received on Thursday, 2 April 2020 12:32:08 UTC