- From: David Booth <david@dbooth.org>
- Date: Mon, 18 May 2020 15:43:53 -0400
- To: w3c semweb HCLS <public-semweb-lifesci@w3.org>
Tomorrow (Tuesday) we will have a series of 5-minute overview presentations by people doing semantic annotation of the CORD-19 dataset: Daniel Stone, Gaurav Vaidya, Gollam Rabby, Marcin Joachimiak, Michael Liebman, Tom Conlin, David Booth. Zoom Link: https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09 If anyone else wishes to present their CORD-19 work, please let me know. We will probably hold another, similar session next week or a following week also, for people who were not able to present today. The CORD-19 dataset is a dataset released by the Allen Institute containing 63,000 journal article related to COVID-19. Thanks, David Booth On 5/13/20 10:46 AM, David Booth wrote: > Notes from yesterday's webinar by Franck Michel are below. Thanks to > Victor Mireles-Chavez a recording of the call is available at the > following URL. Franck's presentation starts at 17:10. > > https://tinyurl.com/y8kmfxhe > Recording password: 7t?N&*9+ > > -------------------------------------------------------------- > MEETING NOTES 12-May-2020 > Present: David Booth, Victor Mireles, Franck Michel, Albert Burger, > Daniel Stone, Deborah McGuiness, Filip, Gaurav Vaidya, Gollam Rabby, > Louis, Gollam Rabby, Louis Rumanes, Marcin Joachimiak, Michael Liebman, > Subhashis Das, Nico, Tom Conlin, Chuming Chen > > Introductions > David Booth: 10 years applying semantic web tech to healthcare and life > sciences, working on Mayo Clinic / Johns-Hopkins University collaboration. > > Subhashis Das: PostDoctoral researcher at CeIC, DCU, Dublin. > Specialization in domain ontology and healthcare data integration. > > Franck's presentation > Slides: > https://www.dropbox.com/s/nnyg1o45f9dvimk/20200512%20Covid-on-the-Web%20-%20CORD-19%20semantic%20annotations.pdf?dl=0 > > > Franck: Goal is to make it easier to find and make sense of COVID-19 > literature: both named entities, and argumentative graphs. Using > DBpedia Spotlight, Entity-fishing, BioPortal Annotator. > > Franck: Releasing v1.1 shortly. 54M named entities, 564k URIs. > 30M NEs, 155,651 URIs from Wikidata > 21M NEs, 339,990 URIs from BioPortal > 1.8M NEs, from DBpedia > https://github.com/wimmics/cord19-nekg > Full modelling details: > https://github.com/Wimmics/cord19-nekg/blob/master/doc/01-data-modeling.md > SPARQL endpoint: http://covid19.i3s.unice.fr/sparql > Virtuoso faceted browsing: http://covid19.i3s.unice.fr:8890/fct/ > Franck: Web annotation ont and PROV-O used to annotate articles. > Annotation points to article and position within the article where the > entity was found. > > Franck: Able to query for cancer entity and its subclasses or instances. > > Franck: Also looking at co-mentions of named entities. > > Franck: Colleagues also working on ACTA: A Tool for Argumentative ... > claims/evidence. This would allow arguments/claims/evidence to be > displayed in a graph. > > David: What ont are you using for determining the subclass relations of > cancer, for example? > Franck: So far using wikidata hierarchy. One exception: viruses in > wikidata are not modeled as classes, so we regenerated them as classes. > > Victor: Why can't DBpedia SPotlight process full text? > Franck: We have 54M NEs, 700M triples. Not enough machine power to do > full text. > > Victor: If I find offsets, how can I be sure that I am aligned in my own > data? > Franck: It refers specifically to the CORD-19 dataset. > > Marcin: How are you extracting info about viral proteins? There are > poly proteins? > Franck: We rely on the results of the tools we're using. If a protein > is identified by those tools then we get them. If an article mentions a > gene name, would it show up? > > Marcin: There are a few of these different entity extraction efforts. > Should we try to merge them? > > David: That's exactly the point of these teleconferences -- to start > learning about each other's work and figure out how best to coordinate. > > michael: We compared analysis of abstracts vs full body, and found > significant difference, because abstract is more of an advertisement. > Also, in dealing with the full body, we found it necessary to parse the > article, separate section on methods, results, conclusions. > > Franck: My colleagues working on argumentative extraction, quality > varies a lot from one category to another. They've noticed > (anecdotally) that clinical trials have an abstract with a few clear > statements about results, and relatively easy to extract, but not for > other articles. > > Victor: Comment on avoiding duplication of effort, there is quite some > effort in doing annotations. Some are better prepared than others. > Takes time. By the time someone presents work, others have already > spent time doing similar work. > > David: We began these calls with very brief presentations by each > participant, but after that, switched to deeper presentations of each > project. > > Deborah: When presenting, please say what of your work is ready for > others to use. > > Tom: Also interested in timing, how long things took, what was good/bad. > > AGREED: Next week we will do 5-minute presentations of what we're doing > or planning. > > Speakers next week: Daniel, Deborah, Gaurav, Gollam, Marcin, John Z, > Michael, Tom, David. > > Subhashis: not next week, but later. > > ADJOURNED > > > On 5/11/20 12:22 PM, David Booth wrote: >> Tomorrow (Tuesday) Franck Michel will present his work on CORD-19 >> Named Entities Knowledge Graph (CORD19-NEKG). >> >> Zoom Link: >> https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09 >> >> >> Thanks, >> David Booth >> >> On 4/28/20 12:09 PM, David Booth wrote: >>> Notes from today's call: >>> >>> MEETING NOTES 28-Apr-2020 >>> Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin, >>> Franck Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian >>> Kohlmeier, Tomáš Kliegr >>> >>> Introductions >>> David Booth: 10 years applying semantic web tech to healthcare and >>> life sciences, working on Mayo Clinic / Johns-Hopkins University >>> collaboration. >>> >>> Louis Rumane: United Health Group, Doing COVID research, looking at >>> making a KG >>> >>> Tom Conlin: Working with Melissa Haendel (Monarch Initiative), >>> >>> Franck: INRIA >>> >>> Gollam: Prague, Univ >>> >>> Jim: Research sci RPI, working on KG w bio >>> >>> Lucy: Allen institute, research scientist. >>> >>> Tomas: Assoc Prof, Prague, KG. >>> >>> Sebastian: Sr Mgr on CORD-19. >>> >>> Victor: Semantic Web company researcher >>> >>> Victor's Presentation >>> Slides here: >>> https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing >>> >>> >>> victor: Semantic Web Company, 40+ FTEs. Makes PoolParty. Works w >>> companies in many counties. Taxonomy helps extract entities from >>> text. image search, data mgmt. >>> >>> victor: Developing text and data mining tools for biomed, and >>> CORD-19. We don't only annotate text. What's useful about annotating >>> text w entities is to use the knowledge, simplest is encoded in SKOS, >>> such as broader/narrower. But to do this we need to annotate the >>> text into URIs, then import relationships into the graph. Trying to >>> link existing annotations w other knowledge sources. Ont is >>> simplified version of NIFT: documents have sections, sections have >>> annotations that are SKOS concepts. >>> >>> victor: So far, we've set up a pipeline to take a document and it >>> finds annotations with offsets. So far imported ChEBI, GO, MeSH, >>> HPO, but using them as controlled vocab. Many are very specific, >>> such as "COVID-19" -- not really NLP, because there are not >>> inflections, plurals, etc. Output is a bunch of triples in the >>> simple SKOS ont previously mentioned. Put them into GraphDB, along >>> with the vocabs. >>> >>> victor: Also looked at SciBite annotations. They've done an >>> excellent job annotating. They also have their own controlled vocab >>> that is very good. JSON files have annotations. Put them into >>> triples. Combining them w bio DBs gives a graph DB. >>> >>> (victor shows relationships in GraphDB viewer) >>> >>> victor: you can navigate the hierarchy of concepts and link them to >>> the paragraphs in CORD-19 DB. >>> >>> (victor shows SPARQL queries) >>> >>> victor: This allows us to pull up the titles and paragraphs of >>> articles that both mention a kind of neoplasm and a kind of coronavirus. >>> >>> victor: Want to take other DBs and put them into GraphDB also. >>> Monarch Initiative is putting together KG, and also puts in SciBite. >>> >>> victor: Missing from both our effort and Monarch: searchability. I >>> showed SPARQL queries using broader/narrower. Also need to be more >>> efficient for humans, working also on faceted search. Monarch >>> Initiative is very good for machine readable stuff. Another thing >>> missing: relation extraction, from the text. How does human >>> determine that some text is saying that a protein interacts with >>> another. JPL (Lewis Magidney?sp?) is using a Stanford NLP for >>> relation extraction. >>> https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph >>> It isn't perfect, but it indicates a relationship. Both entities are >>> in GO. This adds new edges between entities. Lots of interest in >>> this topic now. >>> >>> Franck: We're doing pretty close to this in INRIA, looking at named >>> entities, wikidata entities, queries that gather all articles on >>> cancer and any coronavirus. Another thing we're doing: in addition >>> to detecting named entities, we're running other tools to identify >>> arguments, claims, evidence in articles and draw netowrk of claims >>> and evidence to see what supports the claims. Hope to publish this >>> network soon as RDF graph. >>> >>> victor: PubAnnotation shown last week, showed epistemic analysis. >>> >>> Franck: Argument, clinical trial analysys. Query pubmed and platform >>> analyzes those articles. Want to apply them to CORD-19. >>> >>> Vincent: Is RDF available? victor: Will take a couple more weeks. >>> Vincent: Size? victor: 20GB RDF. >>> >>> David: Overlap between efforts, helpful to learn about each other's >>> work. >>> >>> victor: After looking at Monarch initative, it isn't new, names i >>> recognized from Human Phenotype initative. Most of that summarizes >>> work that others have done. FHIR DB also have overlaps with SciBite. >>> >>> david: SPARQL query was valuable, but biologists need simple UI. >>> >>> jim: Working on faceted browser for various things, that can be >>> reused. Based on SPARQL fragments, property path gives certain >>> values, here's how to render it. Potentially useful here. Also >>> integrated WHYIS Vega (JS framework for charts and visualization), >>> can plug a SPARQL query in and get a chart. People can share how >>> thtey're exploring the graph. >>> https://github.com/tetherless-world/whyis >>> Faceted search is a view in WHYIS, but a lot of the capabilities are >>> designed to use nanopub. >>> >>> Email list for these calls: >>> https://lists.w3.org/Archives/Public/public-semweb-lifesci/ >>> >>> Franck to present next week. >>> >>> ADJOURNED
Received on Monday, 18 May 2020 19:44:07 UTC