- From: David Booth <david@dbooth.org>
- Date: Wed, 13 May 2020 10:46:04 -0400
- To: w3c semweb HCLS <public-semweb-lifesci@w3.org>
- Cc: Franck Michel <fmichel@i3s.unice.fr>
Notes from yesterday's webinar by Franck Michel are below. Thanks to Victor Mireles-Chavez a recording of the call is available at the following URL. Franck's presentation starts at 17:10. https://tinyurl.com/y8kmfxhe Recording password: 7t?N&*9+ -------------------------------------------------------------- MEETING NOTES 12-May-2020 Present: David Booth, Victor Mireles, Franck Michel, Albert Burger, Daniel Stone, Deborah McGuiness, Filip, Gaurav Vaidya, Gollam Rabby, Louis, Gollam Rabby, Louis Rumanes, Marcin Joachimiak, Michael Liebman, Subhashis Das, Nico, Tom Conlin, Chuming Chen Introductions David Booth: 10 years applying semantic web tech to healthcare and life sciences, working on Mayo Clinic / Johns-Hopkins University collaboration. Subhashis Das: PostDoctoral researcher at CeIC, DCU, Dublin. Specialization in domain ontology and healthcare data integration. Franck's presentation Slides: https://www.dropbox.com/s/nnyg1o45f9dvimk/20200512%20Covid-on-the-Web%20-%20CORD-19%20semantic%20annotations.pdf?dl=0 Franck: Goal is to make it easier to find and make sense of COVID-19 literature: both named entities, and argumentative graphs. Using DBpedia Spotlight, Entity-fishing, BioPortal Annotator. Franck: Releasing v1.1 shortly. 54M named entities, 564k URIs. 30M NEs, 155,651 URIs from Wikidata 21M NEs, 339,990 URIs from BioPortal 1.8M NEs, from DBpedia https://github.com/wimmics/cord19-nekg Full modelling details: https://github.com/Wimmics/cord19-nekg/blob/master/doc/01-data-modeling.md SPARQL endpoint: http://covid19.i3s.unice.fr/sparql Virtuoso faceted browsing: http://covid19.i3s.unice.fr:8890/fct/ Franck: Web annotation ont and PROV-O used to annotate articles. Annotation points to article and position within the article where the entity was found. Franck: Able to query for cancer entity and its subclasses or instances. Franck: Also looking at co-mentions of named entities. Franck: Colleagues also working on ACTA: A Tool for Argumentative ... claims/evidence. This would allow arguments/claims/evidence to be displayed in a graph. David: What ont are you using for determining the subclass relations of cancer, for example? Franck: So far using wikidata hierarchy. One exception: viruses in wikidata are not modeled as classes, so we regenerated them as classes. Victor: Why can't DBpedia SPotlight process full text? Franck: We have 54M NEs, 700M triples. Not enough machine power to do full text. Victor: If I find offsets, how can I be sure that I am aligned in my own data? Franck: It refers specifically to the CORD-19 dataset. Marcin: How are you extracting info about viral proteins? There are poly proteins? Franck: We rely on the results of the tools we're using. If a protein is identified by those tools then we get them. If an article mentions a gene name, would it show up? Marcin: There are a few of these different entity extraction efforts. Should we try to merge them? David: That's exactly the point of these teleconferences -- to start learning about each other's work and figure out how best to coordinate. michael: We compared analysis of abstracts vs full body, and found significant difference, because abstract is more of an advertisement. Also, in dealing with the full body, we found it necessary to parse the article, separate section on methods, results, conclusions. Franck: My colleagues working on argumentative extraction, quality varies a lot from one category to another. They've noticed (anecdotally) that clinical trials have an abstract with a few clear statements about results, and relatively easy to extract, but not for other articles. Victor: Comment on avoiding duplication of effort, there is quite some effort in doing annotations. Some are better prepared than others. Takes time. By the time someone presents work, others have already spent time doing similar work. David: We began these calls with very brief presentations by each participant, but after that, switched to deeper presentations of each project. Deborah: When presenting, please say what of your work is ready for others to use. Tom: Also interested in timing, how long things took, what was good/bad. AGREED: Next week we will do 5-minute presentations of what we're doing or planning. Speakers next week: Daniel, Deborah, Gaurav, Gollam, Marcin, John Z, Michael, Tom, David. Subhashis: not next week, but later. ADJOURNED On 5/11/20 12:22 PM, David Booth wrote: > Tomorrow (Tuesday) Franck Michel will present his work on CORD-19 Named > Entities Knowledge Graph (CORD19-NEKG). > > Zoom Link: > https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09 > > Thanks, > David Booth > > On 4/28/20 12:09 PM, David Booth wrote: >> Notes from today's call: >> >> MEETING NOTES 28-Apr-2020 >> Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin, >> Franck Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian >> Kohlmeier, Tomáš Kliegr >> >> Introductions >> David Booth: 10 years applying semantic web tech to healthcare and >> life sciences, working on Mayo Clinic / Johns-Hopkins University >> collaboration. >> >> Louis Rumane: United Health Group, Doing COVID research, looking at >> making a KG >> >> Tom Conlin: Working with Melissa Haendel (Monarch Initiative), >> >> Franck: INRIA >> >> Gollam: Prague, Univ >> >> Jim: Research sci RPI, working on KG w bio >> >> Lucy: Allen institute, research scientist. >> >> Tomas: Assoc Prof, Prague, KG. >> >> Sebastian: Sr Mgr on CORD-19. >> >> Victor: Semantic Web company researcher >> >> Victor's Presentation >> Slides here: >> https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing >> >> >> victor: Semantic Web Company, 40+ FTEs. Makes PoolParty. Works w >> companies in many counties. Taxonomy helps extract entities from >> text. image search, data mgmt. >> >> victor: Developing text and data mining tools for biomed, and CORD-19. >> We don't only annotate text. What's useful about annotating text w >> entities is to use the knowledge, simplest is encoded in SKOS, such as >> broader/narrower. But to do this we need to annotate the text into >> URIs, then import relationships into the graph. Trying to link >> existing annotations w other knowledge sources. Ont is simplified >> version of NIFT: documents have sections, sections have annotations >> that are SKOS concepts. >> >> victor: So far, we've set up a pipeline to take a document and it >> finds annotations with offsets. So far imported ChEBI, GO, MeSH, HPO, >> but using them as controlled vocab. Many are very specific, such as >> "COVID-19" -- not really NLP, because there are not inflections, >> plurals, etc. Output is a bunch of triples in the simple SKOS ont >> previously mentioned. Put them into GraphDB, along with the vocabs. >> >> victor: Also looked at SciBite annotations. They've done an excellent >> job annotating. They also have their own controlled vocab that is >> very good. JSON files have annotations. Put them into triples. >> Combining them w bio DBs gives a graph DB. >> >> (victor shows relationships in GraphDB viewer) >> >> victor: you can navigate the hierarchy of concepts and link them to >> the paragraphs in CORD-19 DB. >> >> (victor shows SPARQL queries) >> >> victor: This allows us to pull up the titles and paragraphs of >> articles that both mention a kind of neoplasm and a kind of coronavirus. >> >> victor: Want to take other DBs and put them into GraphDB also. >> Monarch Initiative is putting together KG, and also puts in SciBite. >> >> victor: Missing from both our effort and Monarch: searchability. I >> showed SPARQL queries using broader/narrower. Also need to be more >> efficient for humans, working also on faceted search. Monarch >> Initiative is very good for machine readable stuff. Another thing >> missing: relation extraction, from the text. How does human determine >> that some text is saying that a protein interacts with another. JPL >> (Lewis Magidney?sp?) is using a Stanford NLP for relation extraction. >> https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph >> It isn't perfect, but it indicates a relationship. Both entities are >> in GO. This adds new edges between entities. Lots of interest in >> this topic now. >> >> Franck: We're doing pretty close to this in INRIA, looking at named >> entities, wikidata entities, queries that gather all articles on >> cancer and any coronavirus. Another thing we're doing: in addition to >> detecting named entities, we're running other tools to identify >> arguments, claims, evidence in articles and draw netowrk of claims and >> evidence to see what supports the claims. Hope to publish this >> network soon as RDF graph. >> >> victor: PubAnnotation shown last week, showed epistemic analysis. >> >> Franck: Argument, clinical trial analysys. Query pubmed and platform >> analyzes those articles. Want to apply them to CORD-19. >> >> Vincent: Is RDF available? victor: Will take a couple more weeks. >> Vincent: Size? victor: 20GB RDF. >> >> David: Overlap between efforts, helpful to learn about each other's work. >> >> victor: After looking at Monarch initative, it isn't new, names i >> recognized from Human Phenotype initative. Most of that summarizes >> work that others have done. FHIR DB also have overlaps with SciBite. >> >> david: SPARQL query was valuable, but biologists need simple UI. >> >> jim: Working on faceted browser for various things, that can be >> reused. Based on SPARQL fragments, property path gives certain values, >> here's how to render it. Potentially useful here. Also integrated >> WHYIS Vega (JS framework for charts and visualization), can plug a >> SPARQL query in and get a chart. People can share how thtey're >> exploring the graph. >> https://github.com/tetherless-world/whyis >> Faceted search is a view in WHYIS, but a lot of the capabilities are >> designed to use nanopub. >> >> Email list for these calls: >> https://lists.w3.org/Archives/Public/public-semweb-lifesci/ >> >> Franck to present next week. >> >> ADJOURNED
Received on Wednesday, 13 May 2020 14:46:19 UTC