Re: CORD-19 semantic annotations - 11am Tuesday (Boston time) - Franck Michel on Named Entities Knowledge Graph

Tomorrow (Tuesday) Franck Michel will present his work on CORD-19 Named 
Entities Knowledge Graph (CORD19-NEKG).

Zoom Link:
https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09

Thanks,
David Booth

On 4/28/20 12:09 PM, David Booth wrote:
> Notes from today's call:
> 
> MEETING NOTES 28-Apr-2020
> Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin, Franck 
> Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian Kohlmeier, 
> Tomáš Kliegr
> 
> Introductions
> David Booth: 10 years applying semantic web tech to healthcare and life 
> sciences, working on Mayo Clinic / Johns-Hopkins University collaboration.
> 
> Louis Rumane: United Health Group, Doing COVID research, looking at 
> making a KG
> 
> Tom Conlin: Working with Melissa Haendel (Monarch Initiative),
> 
> Franck: INRIA
> 
> Gollam: Prague, Univ
> 
> Jim: Research sci RPI, working on KG w bio
> 
> Lucy: Allen institute, research scientist.
> 
> Tomas: Assoc Prof, Prague, KG.
> 
> Sebastian: Sr Mgr on CORD-19.
> 
> Victor: Semantic Web company researcher
> 
> Victor's Presentation
> Slides here: 
> https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing 
> 
> 
> victor: Semantic Web Company, 40+ FTEs.  Makes PoolParty. Works w 
> companies in many counties.  Taxonomy helps extract entities from text. 
> image search, data mgmt.
> 
> victor: Developing text and data mining tools for biomed, and CORD-19. 
> We don't only annotate text.  What's useful about annotating text w 
> entities is to use the knowledge, simplest is encoded in SKOS, such as 
> broader/narrower.  But to do this we need to annotate the text into 
> URIs, then import relationships into the graph.  Trying to link existing 
> annotations w other knowledge sources.  Ont is simplified version of 
> NIFT: documents have sections, sections have annotations that are SKOS 
> concepts.
> 
> victor: So far, we've set up a pipeline to take a document and it finds 
> annotations with offsets.  So far imported ChEBI, GO, MeSH, HPO, but 
> using them as controlled vocab.  Many are very specific, such as 
> "COVID-19" -- not really NLP, because there are not inflections, 
> plurals, etc.  Output is a bunch of triples in the simple SKOS ont 
> previously mentioned. Put them into GraphDB, along with the vocabs.
> 
> victor: Also looked at SciBite annotations.  They've done an excellent 
> job annotating.  They also have their own controlled vocab that is very 
> good.  JSON files have annotations. Put them into triples.  Combining 
> them w bio DBs gives a graph DB.
> 
> (victor shows relationships in GraphDB viewer)
> 
> victor: you can navigate the hierarchy of concepts and link them to the 
> paragraphs in CORD-19 DB.
> 
> (victor shows SPARQL queries)
> 
> victor: This allows us to pull up the titles and paragraphs of articles 
> that both mention a kind of neoplasm and a kind of coronavirus.
> 
> victor: Want to take other DBs and put them into GraphDB also.  Monarch 
> Initiative is putting together KG, and also puts in SciBite.
> 
> victor: Missing from both our effort and Monarch: searchability.  I 
> showed SPARQL queries using broader/narrower.  Also need to be more 
> efficient for humans, working also on faceted search.  Monarch 
> Initiative is very good for machine readable stuff.  Another thing 
> missing: relation extraction, from the text.  How does human determine 
> that some text is saying that a protein interacts with another.  JPL 
> (Lewis Magidney?sp?) is using a Stanford NLP for relation extraction.
> https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph
> It isn't perfect, but it indicates a relationship.  Both entities are in 
> GO.  This adds new edges between entities.  Lots of interest in this 
> topic now.
> 
> Franck: We're doing pretty close to this in INRIA, looking at named 
> entities, wikidata entities, queries that gather all articles on cancer 
> and any coronavirus.  Another thing we're doing: in addition to 
> detecting named entities, we're running other tools to identify 
> arguments, claims, evidence in articles and draw netowrk of claims and 
> evidence to see what supports the claims.  Hope to publish this network 
> soon as RDF graph.
> 
> victor: PubAnnotation shown last week, showed epistemic analysis.
> 
> Franck: Argument, clinical trial analysys.  Query pubmed and platform 
> analyzes those articles.  Want to apply them to CORD-19.
> 
> Vincent: Is RDF available? victor: Will take a couple more weeks. 
> Vincent: Size? victor: 20GB RDF.
> 
> David: Overlap between efforts, helpful to learn about each other's work.
> 
> victor: After looking at Monarch initative, it isn't new, names i 
> recognized from Human Phenotype initative.  Most of that summarizes work 
> that others have done.  FHIR DB also have overlaps with SciBite.
> 
> david: SPARQL query was valuable, but biologists need simple UI.
> 
> jim: Working on faceted browser for various things, that can be reused. 
> Based on SPARQL fragments, property path gives certain values, here's 
> how to render it.  Potentially useful here.  Also integrated WHYIS Vega 
> (JS framework for charts and visualization), can plug a SPARQL query in 
> and get a chart.  People can share how thtey're exploring the graph.
> https://github.com/tetherless-world/whyis
> Faceted search is a view in WHYIS, but a lot of the capabilities are 
> designed to use nanopub.
> 
> Email list for these calls: 
> https://lists.w3.org/Archives/Public/public-semweb-lifesci/
> 
> Franck to present next week.
> 
> ADJOURNED

Received on Monday, 11 May 2020 16:23:06 UTC