- From: Franck Michel <franck.michel@cnrs.fr>
- Date: Tue, 12 May 2020 12:32:15 +0200
- To: David Booth <david@dbooth.org>, w3c semweb HCLS <public-semweb-lifesci@w3.org>
Dear David, dear all,
Just a precision, I'll present our work and perspectives in the project
"Covid-on-the-Web", revolving around Named Entities and Argumentative
Graph Based on the CORD-19 Corpus.
Regards,
Franck.
Le 11/05/2020 à 18:22, David Booth a écrit :
> Tomorrow (Tuesday) Franck Michel will present his work on CORD-19
> Named Entities Knowledge Graph (CORD19-NEKG).
>
> Zoom Link:
> https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09
>
>
> Thanks,
> David Booth
>
> On 4/28/20 12:09 PM, David Booth wrote:
>> Notes from today's call:
>>
>> MEETING NOTES 28-Apr-2020
>> Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin,
>> Franck Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian
>> Kohlmeier, Tomáš Kliegr
>>
>> Introductions
>> David Booth: 10 years applying semantic web tech to healthcare and
>> life sciences, working on Mayo Clinic / Johns-Hopkins University
>> collaboration.
>>
>> Louis Rumane: United Health Group, Doing COVID research, looking at
>> making a KG
>>
>> Tom Conlin: Working with Melissa Haendel (Monarch Initiative),
>>
>> Franck: INRIA
>>
>> Gollam: Prague, Univ
>>
>> Jim: Research sci RPI, working on KG w bio
>>
>> Lucy: Allen institute, research scientist.
>>
>> Tomas: Assoc Prof, Prague, KG.
>>
>> Sebastian: Sr Mgr on CORD-19.
>>
>> Victor: Semantic Web company researcher
>>
>> Victor's Presentation
>> Slides here:
>> https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing
>>
>>
>> victor: Semantic Web Company, 40+ FTEs. Makes PoolParty. Works w
>> companies in many counties. Taxonomy helps extract entities from
>> text. image search, data mgmt.
>>
>> victor: Developing text and data mining tools for biomed, and
>> CORD-19. We don't only annotate text. What's useful about annotating
>> text w entities is to use the knowledge, simplest is encoded in SKOS,
>> such as broader/narrower. But to do this we need to annotate the
>> text into URIs, then import relationships into the graph. Trying to
>> link existing annotations w other knowledge sources. Ont is
>> simplified version of NIFT: documents have sections, sections have
>> annotations that are SKOS concepts.
>>
>> victor: So far, we've set up a pipeline to take a document and it
>> finds annotations with offsets. So far imported ChEBI, GO, MeSH,
>> HPO, but using them as controlled vocab. Many are very specific,
>> such as "COVID-19" -- not really NLP, because there are not
>> inflections, plurals, etc. Output is a bunch of triples in the
>> simple SKOS ont previously mentioned. Put them into GraphDB, along
>> with the vocabs.
>>
>> victor: Also looked at SciBite annotations. They've done an
>> excellent job annotating. They also have their own controlled vocab
>> that is very good. JSON files have annotations. Put them into
>> triples. Combining them w bio DBs gives a graph DB.
>>
>> (victor shows relationships in GraphDB viewer)
>>
>> victor: you can navigate the hierarchy of concepts and link them to
>> the paragraphs in CORD-19 DB.
>>
>> (victor shows SPARQL queries)
>>
>> victor: This allows us to pull up the titles and paragraphs of
>> articles that both mention a kind of neoplasm and a kind of coronavirus.
>>
>> victor: Want to take other DBs and put them into GraphDB also.
>> Monarch Initiative is putting together KG, and also puts in SciBite.
>>
>> victor: Missing from both our effort and Monarch: searchability. I
>> showed SPARQL queries using broader/narrower. Also need to be more
>> efficient for humans, working also on faceted search. Monarch
>> Initiative is very good for machine readable stuff. Another thing
>> missing: relation extraction, from the text. How does human
>> determine that some text is saying that a protein interacts with
>> another. JPL (Lewis Magidney?sp?) is using a Stanford NLP for
>> relation extraction.
>> https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph
>> It isn't perfect, but it indicates a relationship. Both entities are
>> in GO. This adds new edges between entities. Lots of interest in
>> this topic now.
>>
>> Franck: We're doing pretty close to this in INRIA, looking at named
>> entities, wikidata entities, queries that gather all articles on
>> cancer and any coronavirus. Another thing we're doing: in addition
>> to detecting named entities, we're running other tools to identify
>> arguments, claims, evidence in articles and draw netowrk of claims
>> and evidence to see what supports the claims. Hope to publish this
>> network soon as RDF graph.
>>
>> victor: PubAnnotation shown last week, showed epistemic analysis.
>>
>> Franck: Argument, clinical trial analysys. Query pubmed and platform
>> analyzes those articles. Want to apply them to CORD-19.
>>
>> Vincent: Is RDF available? victor: Will take a couple more weeks.
>> Vincent: Size? victor: 20GB RDF.
>>
>> David: Overlap between efforts, helpful to learn about each other's
>> work.
>>
>> victor: After looking at Monarch initative, it isn't new, names i
>> recognized from Human Phenotype initative. Most of that summarizes
>> work that others have done. FHIR DB also have overlaps with SciBite.
>>
>> david: SPARQL query was valuable, but biologists need simple UI.
>>
>> jim: Working on faceted browser for various things, that can be
>> reused. Based on SPARQL fragments, property path gives certain
>> values, here's how to render it. Potentially useful here. Also
>> integrated WHYIS Vega (JS framework for charts and visualization),
>> can plug a SPARQL query in and get a chart. People can share how
>> thtey're exploring the graph.
>> https://github.com/tetherless-world/whyis
>> Faceted search is a view in WHYIS, but a lot of the capabilities are
>> designed to use nanopub.
>>
>> Email list for these calls:
>> https://lists.w3.org/Archives/Public/public-semweb-lifesci/
>>
>> Franck to present next week.
>>
>> ADJOURNED
Received on Tuesday, 12 May 2020 10:32:33 UTC