- From: Franck Michel <franck.michel@cnrs.fr>
- Date: Tue, 12 May 2020 12:32:15 +0200
- To: David Booth <david@dbooth.org>, w3c semweb HCLS <public-semweb-lifesci@w3.org>
Dear David, dear all, Just a precision, I'll present our work and perspectives in the project "Covid-on-the-Web", revolving around Named Entities and Argumentative Graph Based on the CORD-19 Corpus. Regards, Franck. Le 11/05/2020 à 18:22, David Booth a écrit : > Tomorrow (Tuesday) Franck Michel will present his work on CORD-19 > Named Entities Knowledge Graph (CORD19-NEKG). > > Zoom Link: > https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09 > > > Thanks, > David Booth > > On 4/28/20 12:09 PM, David Booth wrote: >> Notes from today's call: >> >> MEETING NOTES 28-Apr-2020 >> Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin, >> Franck Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian >> Kohlmeier, Tomáš Kliegr >> >> Introductions >> David Booth: 10 years applying semantic web tech to healthcare and >> life sciences, working on Mayo Clinic / Johns-Hopkins University >> collaboration. >> >> Louis Rumane: United Health Group, Doing COVID research, looking at >> making a KG >> >> Tom Conlin: Working with Melissa Haendel (Monarch Initiative), >> >> Franck: INRIA >> >> Gollam: Prague, Univ >> >> Jim: Research sci RPI, working on KG w bio >> >> Lucy: Allen institute, research scientist. >> >> Tomas: Assoc Prof, Prague, KG. >> >> Sebastian: Sr Mgr on CORD-19. >> >> Victor: Semantic Web company researcher >> >> Victor's Presentation >> Slides here: >> https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing >> >> >> victor: Semantic Web Company, 40+ FTEs. Makes PoolParty. Works w >> companies in many counties. Taxonomy helps extract entities from >> text. image search, data mgmt. >> >> victor: Developing text and data mining tools for biomed, and >> CORD-19. We don't only annotate text. What's useful about annotating >> text w entities is to use the knowledge, simplest is encoded in SKOS, >> such as broader/narrower. But to do this we need to annotate the >> text into URIs, then import relationships into the graph. Trying to >> link existing annotations w other knowledge sources. Ont is >> simplified version of NIFT: documents have sections, sections have >> annotations that are SKOS concepts. >> >> victor: So far, we've set up a pipeline to take a document and it >> finds annotations with offsets. So far imported ChEBI, GO, MeSH, >> HPO, but using them as controlled vocab. Many are very specific, >> such as "COVID-19" -- not really NLP, because there are not >> inflections, plurals, etc. Output is a bunch of triples in the >> simple SKOS ont previously mentioned. Put them into GraphDB, along >> with the vocabs. >> >> victor: Also looked at SciBite annotations. They've done an >> excellent job annotating. They also have their own controlled vocab >> that is very good. JSON files have annotations. Put them into >> triples. Combining them w bio DBs gives a graph DB. >> >> (victor shows relationships in GraphDB viewer) >> >> victor: you can navigate the hierarchy of concepts and link them to >> the paragraphs in CORD-19 DB. >> >> (victor shows SPARQL queries) >> >> victor: This allows us to pull up the titles and paragraphs of >> articles that both mention a kind of neoplasm and a kind of coronavirus. >> >> victor: Want to take other DBs and put them into GraphDB also. >> Monarch Initiative is putting together KG, and also puts in SciBite. >> >> victor: Missing from both our effort and Monarch: searchability. I >> showed SPARQL queries using broader/narrower. Also need to be more >> efficient for humans, working also on faceted search. Monarch >> Initiative is very good for machine readable stuff. Another thing >> missing: relation extraction, from the text. How does human >> determine that some text is saying that a protein interacts with >> another. JPL (Lewis Magidney?sp?) is using a Stanford NLP for >> relation extraction. >> https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph >> It isn't perfect, but it indicates a relationship. Both entities are >> in GO. This adds new edges between entities. Lots of interest in >> this topic now. >> >> Franck: We're doing pretty close to this in INRIA, looking at named >> entities, wikidata entities, queries that gather all articles on >> cancer and any coronavirus. Another thing we're doing: in addition >> to detecting named entities, we're running other tools to identify >> arguments, claims, evidence in articles and draw netowrk of claims >> and evidence to see what supports the claims. Hope to publish this >> network soon as RDF graph. >> >> victor: PubAnnotation shown last week, showed epistemic analysis. >> >> Franck: Argument, clinical trial analysys. Query pubmed and platform >> analyzes those articles. Want to apply them to CORD-19. >> >> Vincent: Is RDF available? victor: Will take a couple more weeks. >> Vincent: Size? victor: 20GB RDF. >> >> David: Overlap between efforts, helpful to learn about each other's >> work. >> >> victor: After looking at Monarch initative, it isn't new, names i >> recognized from Human Phenotype initative. Most of that summarizes >> work that others have done. FHIR DB also have overlaps with SciBite. >> >> david: SPARQL query was valuable, but biologists need simple UI. >> >> jim: Working on faceted browser for various things, that can be >> reused. Based on SPARQL fragments, property path gives certain >> values, here's how to render it. Potentially useful here. Also >> integrated WHYIS Vega (JS framework for charts and visualization), >> can plug a SPARQL query in and get a chart. People can share how >> thtey're exploring the graph. >> https://github.com/tetherless-world/whyis >> Faceted search is a view in WHYIS, but a lot of the capabilities are >> designed to use nanopub. >> >> Email list for these calls: >> https://lists.w3.org/Archives/Public/public-semweb-lifesci/ >> >> Franck to present next week. >> >> ADJOURNED
Received on Tuesday, 12 May 2020 10:32:33 UTC