- From: Sören Auer <auer@l3s.de>
- Date: Thu, 2 Apr 2020 14:31:45 +0200
- To: Franck Michel <franck.michel@cnrs.fr>, public-lod <public-lod@w3.org>, semantic-web@w3.org, rda-covid19@rda-groups.org
Dear Franck, all, Thanks for this imitative! Since many of us are working on COVID related things now, I wanted to share my general impression with the community: From my experience, purely automated named entity recognition and relation extraction approaches do *NOT* reach sufficient precision and recall for most real world use cases in scholarly knowledge discovery. As a result of this, we should invest our efforts in approaches, which have really the potential to provide concrete value to epidemiology and virology research. In order to do so, we should start with very concrete use cases and research questions, which we aim to support answering. As a result, we can then benchmark the extend to which our approaches provide concrete value in reality. As an example, one important research question for COVID is the R0 reproductive number estimates for SARS-nCoV-2019. We created a systematic comparison of different studies aiming to answer this question in our Open Research Knowledge Graph here: https://www.orkg.org/orkg/comparison/R12251 We are now working on integrating some domain specific visualizations of the R0 estimates and their confidence intervals: https://vitalis-wiens.github.io/ChartVisTest/ It would be great, if we could work together on covering more COVID research in the Open Research Knowledge Graph - more info can be also found here: https://projects.tib.eu/orkg/get-involved/ In particular, we will be happy to ingest semantic extraction results for answering concrete research questions into the ORKG. Best regards and stay healthy everyone, Sören On 02.04.2020 12:18, Franck Michel wrote: > Dear colleagues, > > In order to foster innovative work based on the cross-linking of > COVID-19 literature with the Data Web, we (Wimmics team, Inria > <https://team.inria.fr/wimmics/>) are in the process of generating an > RDF dataset describing the named entities identified in the research > papers of the CORD-19 > <https://pages.semanticscholar.org/coronavirus-research> corpus. > > To identify and disambiguate the named entities, we are using NCBO > BioPortal annotator <http://bioportal.bioontology.org/annotatorplus>, > Entity-fishing <https://github.com/kermitt2/entity-fishing> (links to > Wikidata) and DBpedia Spotlight <https://www.dbpedia-spotlight.org/> > (links to DBpedia). We are also taking care of linking to other related > works such as CORD-19-on-FHIR > <https://github.com/fhircat/CORD-19-on-FHIR> and COVID-19 Literature KG > <https://www.kaggle.com/group16/covid19-literature-knowledge-graph>. > > We shall release this dataset soon, as n RDF dump as well as through a > dedicated SPARQL endpoint. Stay tuned! > > Regards, > Franck. > -- > signature > > > Franck MICHEL - CNRS research engineer > Université Côte d’Azur, CNRS, Inria > I3S laboratory (UMR 7271) > franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr> - +33 (0)4 8915 4277 > >
Received on Thursday, 2 April 2020 12:32:08 UTC