- From: David Booth <david@dbooth.org>
- Date: Mon, 11 May 2020 12:22:53 -0400
- To: w3c semweb HCLS <public-semweb-lifesci@w3.org>
- Cc: Franck Michel <fmichel@i3s.unice.fr>
Tomorrow (Tuesday) Franck Michel will present his work on CORD-19 Named Entities Knowledge Graph (CORD19-NEKG). Zoom Link: https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09 Thanks, David Booth On 4/28/20 12:09 PM, David Booth wrote: > Notes from today's call: > > MEETING NOTES 28-Apr-2020 > Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin, Franck > Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian Kohlmeier, > Tomáš Kliegr > > Introductions > David Booth: 10 years applying semantic web tech to healthcare and life > sciences, working on Mayo Clinic / Johns-Hopkins University collaboration. > > Louis Rumane: United Health Group, Doing COVID research, looking at > making a KG > > Tom Conlin: Working with Melissa Haendel (Monarch Initiative), > > Franck: INRIA > > Gollam: Prague, Univ > > Jim: Research sci RPI, working on KG w bio > > Lucy: Allen institute, research scientist. > > Tomas: Assoc Prof, Prague, KG. > > Sebastian: Sr Mgr on CORD-19. > > Victor: Semantic Web company researcher > > Victor's Presentation > Slides here: > https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing > > > victor: Semantic Web Company, 40+ FTEs. Makes PoolParty. Works w > companies in many counties. Taxonomy helps extract entities from text. > image search, data mgmt. > > victor: Developing text and data mining tools for biomed, and CORD-19. > We don't only annotate text. What's useful about annotating text w > entities is to use the knowledge, simplest is encoded in SKOS, such as > broader/narrower. But to do this we need to annotate the text into > URIs, then import relationships into the graph. Trying to link existing > annotations w other knowledge sources. Ont is simplified version of > NIFT: documents have sections, sections have annotations that are SKOS > concepts. > > victor: So far, we've set up a pipeline to take a document and it finds > annotations with offsets. So far imported ChEBI, GO, MeSH, HPO, but > using them as controlled vocab. Many are very specific, such as > "COVID-19" -- not really NLP, because there are not inflections, > plurals, etc. Output is a bunch of triples in the simple SKOS ont > previously mentioned. Put them into GraphDB, along with the vocabs. > > victor: Also looked at SciBite annotations. They've done an excellent > job annotating. They also have their own controlled vocab that is very > good. JSON files have annotations. Put them into triples. Combining > them w bio DBs gives a graph DB. > > (victor shows relationships in GraphDB viewer) > > victor: you can navigate the hierarchy of concepts and link them to the > paragraphs in CORD-19 DB. > > (victor shows SPARQL queries) > > victor: This allows us to pull up the titles and paragraphs of articles > that both mention a kind of neoplasm and a kind of coronavirus. > > victor: Want to take other DBs and put them into GraphDB also. Monarch > Initiative is putting together KG, and also puts in SciBite. > > victor: Missing from both our effort and Monarch: searchability. I > showed SPARQL queries using broader/narrower. Also need to be more > efficient for humans, working also on faceted search. Monarch > Initiative is very good for machine readable stuff. Another thing > missing: relation extraction, from the text. How does human determine > that some text is saying that a protein interacts with another. JPL > (Lewis Magidney?sp?) is using a Stanford NLP for relation extraction. > https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph > It isn't perfect, but it indicates a relationship. Both entities are in > GO. This adds new edges between entities. Lots of interest in this > topic now. > > Franck: We're doing pretty close to this in INRIA, looking at named > entities, wikidata entities, queries that gather all articles on cancer > and any coronavirus. Another thing we're doing: in addition to > detecting named entities, we're running other tools to identify > arguments, claims, evidence in articles and draw netowrk of claims and > evidence to see what supports the claims. Hope to publish this network > soon as RDF graph. > > victor: PubAnnotation shown last week, showed epistemic analysis. > > Franck: Argument, clinical trial analysys. Query pubmed and platform > analyzes those articles. Want to apply them to CORD-19. > > Vincent: Is RDF available? victor: Will take a couple more weeks. > Vincent: Size? victor: 20GB RDF. > > David: Overlap between efforts, helpful to learn about each other's work. > > victor: After looking at Monarch initative, it isn't new, names i > recognized from Human Phenotype initative. Most of that summarizes work > that others have done. FHIR DB also have overlaps with SciBite. > > david: SPARQL query was valuable, but biologists need simple UI. > > jim: Working on faceted browser for various things, that can be reused. > Based on SPARQL fragments, property path gives certain values, here's > how to render it. Potentially useful here. Also integrated WHYIS Vega > (JS framework for charts and visualization), can plug a SPARQL query in > and get a chart. People can share how thtey're exploring the graph. > https://github.com/tetherless-world/whyis > Faceted search is a view in WHYIS, but a lot of the capabilities are > designed to use nanopub. > > Email list for these calls: > https://lists.w3.org/Archives/Public/public-semweb-lifesci/ > > Franck to present next week. > > ADJOURNED
Received on Monday, 11 May 2020 16:23:06 UTC