- From: David Booth <david@dbooth.org>
- Date: Tue, 21 Apr 2020 10:47:50 -0400
- To: w3c semweb HCLS <public-semweb-lifesci@w3.org>
Last minute schedule change for today's call: Instead of Scott Malec, Jin-Dong Kim will present his work on "An open collaboration for richly annotating Covid-19 Literature". Slides are here: https://docs.google.com/presentation/d/1ynoe1Xxc_-rTiebbvvuPBQMaktK-DX87McuDVaLbI1g/edit#slide=id.g726dbf02a0_0_0 David Booth On 4/20/20 11:56 AM, David Booth wrote: > Tomorrow (Tuesday) 11am Boston time Scott Malec will discuss his work on > computable knowledge extraction using the CORD-19 dataset that was > released by the Allen Institute. > > We will use this google hangout: > http://tinyurl.com/fhirrdf > > More on Scott's work: > https://github.com/fhircat/CORD-19-on-FHIR/wiki/CORD-19-Semantic-Annotation-Projects#project-name-cord-semantictriples > > > We still have time for one other presentation tomorrow about CORD-19 > semantic annotation. If anyone else is ready (with slides) to present > for 20 minutes, please let me know. > > Thanks, > David Booth > > ----------------------------------------------- > > MEETING NOTES 7-Apr-2020 > Present: David Booth <david@dbooth.org>, Sebastian Kohlmeier > <sebastiank@allenai.org>, Lucy Lu Wang <lucyw@allenai.org>, Kyle Lo > <kylel@allenai.org>, Jim McCusker <mccusker@gmail.com>, Scott Malec > <sam413@pitt.edu>, Guoqian Jiang <jiang.guoqian@mayo.edu>, Todor Primov > <todor.primov@ontotext.com> > > Sebastian: Allen Institute, Semantic Scholar, Non-profit AI institute, w > Lucy and Kyle. Engaged in COVID-19 because as non-profit could develop > a corpus that we can make available. Created CORD-19 dataset. Goal: > Standardized format that's easy for machines to read, to enable quick > analysys of the literature. Working to extend it. Weekly updates, but > want to get to daily updates. Want to also get to to entity and > relation extraction. > > Guoqian: Identifiers used? SHA numbers for full text, but also IDs > linked to DOIs and Pubmed IDs. Should discuss best way to have unique > ID for publication. > > Kyle: Added unique IDs: cord_UID. SHA is a hash of PDF, and sometimes > there are multiple PDFs for a single paper. > > Jim: DOIs? > > Lucy: Some papers do not have a DOI. > > Jim: Building a KG using generalized tools from another projects, used > in many domains. Looking to do drug repurposing using CORD-19. Using > an extract of CORD-19. Does deep extraction of named entities and > relationships. Use PROV ont and nanopublications, for rich modeling and > provenance for probabilistic KG. Arcs in picture are based on > confidence level. Allows high precision on drugs that have been tested > on melanoma before. Re-applying this to COVID-19. We focus on open > ontologies, and not using FHIR. Unpublished yet. Page-rank based > analysis of pubmed citation graph, to compute community trust in a paper. > > Guoqian: What ont? > > Jim: Drugbank mostly. Lots of targets. > > Kyle: Relation-entity set. Closed set? > > Jim: We have drug graph, protein-protein interaction, and drugbank has > drug-protein interaction. Molecular interaction. CTD Comparative > Toxinomic Database, Heng Ji Lab database started with it. > > Kyle: Trying to add more KB entities? > > Jim: Want to expand the interaction set. Also entities. We have all > human proteins and drugbank drugs. If you have a drug with an effect on > a target similar protein in COVID-19, will there be hits, directly or > indirectly? To do that, we want to score it also based on confidence in > the research. > > Scott: My research approach is to integrate structured knowledge from > literature or other curated sources, and combine with observational data > to facilitate more reliable inference. General idea is that contextual > info can help interpret and identify confounders. Confounders are > common causes of the predictor and outcome. What I did with CORD-19: > took pubmed IDs, and found what machine reading performed and created > KG. Machine reading can run for months. Jim's work on citation > analysis is cool. Using semrep, developed by NLM, over titles and > abstracts in pubmed. Using Pubmed central IDs from metadata table, in > beginning of March, 31k papers, with 28k in pubmed central. Seemed like > a good place to start building a KG quickly, to see the big picture > quickly. Pulled 106k semantic predications in the 21k docs, pulled into > cytoscape and computed network centrality, and from that ranked. Fitered > by biomedicl entities, diseases, syndromes, amino acids, peptides or > pharm substances. Ranked themm by centrality to understnad their > importance. Very prelim analysis. Interested to see how I might expand > on this and learn what others are doing. > > Guoqian: Can cytoscape support RDF graphs? David: Yes. Jim: Yes, and > you can form SPARQL queries to extract specific interactions. Not 1:1 > mapping of RDF graph to bio network. > > Todor: There are different plugins, one is SPARQL endpoint. Others for > other visualizations. Keep expectations low. > > Jim: It also includes a knowledge exploration interface, built on > cytoscape.js, a re-implementation of cytoscape. The implementation I'm > using has some interface element. > > Lucy: How does Coronavirus ont relate? > > Guoqian: Using this ont to annotate the papers. > > Lucy: Where did that ont come from? > > Jim: Built using OBO foundries? Guoqian: Yes. > > Jim: We use OBO ont. Oliver has a lot of tools for subsetting and > extracting for app ontologies. > > Guoqian: Also collaborating with Cochrane PICO ontology, devloping > COVID-19 PICO ont, specific subtypes of the high level types, eg, > subtypes of population with particular co-morbilitidies. The ont is > also avail on github. > > Guoqian: How to collaborate? Need a registry for KG from this community? > > Lucy: Working on semantic annotation of entity and rel. Lots of people > are doing bottom-up annotation, without formal vocab, then linking to > UMLS. But haven't seen COVID-19 ont. > > Guoqian: Also should look at use cases that different groups have. > Community said they want open vocab instead of SNOMED-CT, such as UMLS. > > Lucy: Also working with a group at AWS, KB of concepts, link to ICD-10 > and RXNorm, also lots of requests for protein and interactions. > > Guoqian: Also procedure datasets. > > Lucy: What use cases are these projects addressing? > > Guoqian: For EBMonFHIR, they are focused on review of evidence, and > clinical concepts. Other team looking at using OBO ont to analyse DB to > mine underlying mechanisms. Ideally we should have linkage across > vocabularies. Eg UMLS can link many ont. But for OBO it might be a > challenge. > > Jim: From microbio perspectvie, most useful from this group would be > having cross mapping from clinical/FHIR/SNOMED-ish world and OBO bio > world, with translation between the two. E.g. I use uniprot IDs. Is > that a problem? What about drug IDs? IDs are the hardest part -- agree > on some, and mappings for others. > > Guoqian: If we can provide a list of ont each team prefers, we can discuss. > > Lucy: Would be great to be able to share annotations. Centralized > vocab? Central KB? Use cases are key. > > Scott: Mapping problems with COVID-19 are same as other mapping > problems. Should have a central place to share projects. Should keep > use cases in mind. > > Sebastian: Please give us feedback on the dataset! > > Todor: Focus on specific questions that you want to answer, then map > using common IDs to address them. > > Daniel: What formats? Right now we're using FHIR. Use others? > > Jim: identifier.org might be useful for mapping. > > David: Useful to have each group present use cases and vocab. > > We'll meet weekly, same time, 1 hour. Each group will present their > work in more detail, with focus on: > what use cases they are addressing; and > what vocabularies / ontologies they're using. > > Each group will present for 20 min presents, 10 min questions. > > ADJOURNED
Received on Tuesday, 21 April 2020 14:48:06 UTC