Re: CORD-19 semantic annotations - 11am Tuesday (Boston time) - Franck Michel on Named Entities Knowledge Graph from Franck Michel on 2020-05-12 (public-semweb-lifesci@w3.org from May 2020)

From: Franck Michel <franck.michel@cnrs.fr>
Date: Tue, 12 May 2020 12:32:15 +0200
To: David Booth <david@dbooth.org>, w3c semweb HCLS <public-semweb-lifesci@w3.org>
Message-ID: <d05d441e-e93c-b33a-46d6-b7498b6b72e5@cnrs.fr>
Dear David, dear all,

Just a precision, I'll present our work and perspectives in the project 
"Covid-on-the-Web", revolving around Named Entities and Argumentative 
Graph Based on the CORD-19 Corpus.

Regards,
     Franck.

Le 11/05/2020 à 18:22, David Booth a écrit :
> Tomorrow (Tuesday) Franck Michel will present his work on CORD-19 
> Named Entities Knowledge Graph (CORD19-NEKG).
>
> Zoom Link:
> https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09 
>
>
> Thanks,
> David Booth
>
> On 4/28/20 12:09 PM, David Booth wrote:
>> Notes from today's call:
>>
>> MEETING NOTES 28-Apr-2020
>> Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin, 
>> Franck Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian 
>> Kohlmeier, Tomáš Kliegr
>>
>> Introductions
>> David Booth: 10 years applying semantic web tech to healthcare and 
>> life sciences, working on Mayo Clinic / Johns-Hopkins University 
>> collaboration.
>>
>> Louis Rumane: United Health Group, Doing COVID research, looking at 
>> making a KG
>>
>> Tom Conlin: Working with Melissa Haendel (Monarch Initiative),
>>
>> Franck: INRIA
>>
>> Gollam: Prague, Univ
>>
>> Jim: Research sci RPI, working on KG w bio
>>
>> Lucy: Allen institute, research scientist.
>>
>> Tomas: Assoc Prof, Prague, KG.
>>
>> Sebastian: Sr Mgr on CORD-19.
>>
>> Victor: Semantic Web company researcher
>>
>> Victor's Presentation
>> Slides here: 
>> https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing 
>>
>>
>> victor: Semantic Web Company, 40+ FTEs.  Makes PoolParty. Works w 
>> companies in many counties.  Taxonomy helps extract entities from 
>> text. image search, data mgmt.
>>
>> victor: Developing text and data mining tools for biomed, and 
>> CORD-19. We don't only annotate text.  What's useful about annotating 
>> text w entities is to use the knowledge, simplest is encoded in SKOS, 
>> such as broader/narrower.  But to do this we need to annotate the 
>> text into URIs, then import relationships into the graph.  Trying to 
>> link existing annotations w other knowledge sources.  Ont is 
>> simplified version of NIFT: documents have sections, sections have 
>> annotations that are SKOS concepts.
>>
>> victor: So far, we've set up a pipeline to take a document and it 
>> finds annotations with offsets.  So far imported ChEBI, GO, MeSH, 
>> HPO, but using them as controlled vocab.  Many are very specific, 
>> such as "COVID-19" -- not really NLP, because there are not 
>> inflections, plurals, etc.  Output is a bunch of triples in the 
>> simple SKOS ont previously mentioned. Put them into GraphDB, along 
>> with the vocabs.
>>
>> victor: Also looked at SciBite annotations.  They've done an 
>> excellent job annotating.  They also have their own controlled vocab 
>> that is very good.  JSON files have annotations. Put them into 
>> triples.  Combining them w bio DBs gives a graph DB.
>>
>> (victor shows relationships in GraphDB viewer)
>>
>> victor: you can navigate the hierarchy of concepts and link them to 
>> the paragraphs in CORD-19 DB.
>>
>> (victor shows SPARQL queries)
>>
>> victor: This allows us to pull up the titles and paragraphs of 
>> articles that both mention a kind of neoplasm and a kind of coronavirus.
>>
>> victor: Want to take other DBs and put them into GraphDB also. 
>> Monarch Initiative is putting together KG, and also puts in SciBite.
>>
>> victor: Missing from both our effort and Monarch: searchability.  I 
>> showed SPARQL queries using broader/narrower. Also need to be more 
>> efficient for humans, working also on faceted search.  Monarch 
>> Initiative is very good for machine readable stuff.  Another thing 
>> missing: relation extraction, from the text.  How does human 
>> determine that some text is saying that a protein interacts with 
>> another.  JPL (Lewis Magidney?sp?) is using a Stanford NLP for 
>> relation extraction.
>> https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph
>> It isn't perfect, but it indicates a relationship.  Both entities are 
>> in GO.  This adds new edges between entities.  Lots of interest in 
>> this topic now.
>>
>> Franck: We're doing pretty close to this in INRIA, looking at named 
>> entities, wikidata entities, queries that gather all articles on 
>> cancer and any coronavirus.  Another thing we're doing: in addition 
>> to detecting named entities, we're running other tools to identify 
>> arguments, claims, evidence in articles and draw netowrk of claims 
>> and evidence to see what supports the claims.  Hope to publish this 
>> network soon as RDF graph.
>>
>> victor: PubAnnotation shown last week, showed epistemic analysis.
>>
>> Franck: Argument, clinical trial analysys.  Query pubmed and platform 
>> analyzes those articles.  Want to apply them to CORD-19.
>>
>> Vincent: Is RDF available? victor: Will take a couple more weeks. 
>> Vincent: Size? victor: 20GB RDF.
>>
>> David: Overlap between efforts, helpful to learn about each other's 
>> work.
>>
>> victor: After looking at Monarch initative, it isn't new, names i 
>> recognized from Human Phenotype initative.  Most of that summarizes 
>> work that others have done.  FHIR DB also have overlaps with SciBite.
>>
>> david: SPARQL query was valuable, but biologists need simple UI.
>>
>> jim: Working on faceted browser for various things, that can be 
>> reused. Based on SPARQL fragments, property path gives certain 
>> values, here's how to render it.  Potentially useful here.  Also 
>> integrated WHYIS Vega (JS framework for charts and visualization), 
>> can plug a SPARQL query in and get a chart. People can share how 
>> thtey're exploring the graph.
>> https://github.com/tetherless-world/whyis
>> Faceted search is a view in WHYIS, but a lot of the capabilities are 
>> designed to use nanopub.
>>
>> Email list for these calls: 
>> https://lists.w3.org/Archives/Public/public-semweb-lifesci/
>>
>> Franck to present next week.
>>
>> ADJOURNED
Received on Tuesday, 12 May 2020 10:32:33 UTC