- From: David Booth <david@dbooth.org>
- Date: Tue, 5 May 2020 11:07:14 -0400
- To: w3c semweb HCLS <public-semweb-lifesci@w3.org>
Apologies, I thought I had sent out the announcement for today's call, but I just discovered that it never left my Drafts folder. :( We are rescheduling for next week instead, when Franck Michel will present his work on CORD-19 Named Entities Knowledge Graph (CORD19-NEKG). David Booth On 4/28/20 12:09 PM, David Booth wrote: > Notes from today's call: > > MEETING NOTES 28-Apr-2020 > Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin, Franck > Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian Kohlmeier, > Tomáš Kliegr > > Introductions > David Booth: 10 years applying semantic web tech to healthcare and life > sciences, working on Mayo Clinic / Johns-Hopkins University collaboration. > > Louis Rumane: United Health Group, Doing COVID research, looking at > making a KG > > Tom Conlin: Working with Melissa Haendel (Monarch Initiative), > > Franck: INRIA > > Gollam: Prague, Univ > > Jim: Research sci RPI, working on KG w bio > > Lucy: Allen institute, research scientist. > > Tomas: Assoc Prof, Prague, KG. > > Sebastian: Sr Mgr on CORD-19. > > Victor: Semantic Web company researcher > > Victor's Presentation > Slides here: > https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing > > > victor: Semantic Web Company, 40+ FTEs. Makes PoolParty. Works w > companies in many counties. Taxonomy helps extract entities from text. > image search, data mgmt. > > victor: Developing text and data mining tools for biomed, and CORD-19. > We don't only annotate text. What's useful about annotating text w > entities is to use the knowledge, simplest is encoded in SKOS, such as > broader/narrower. But to do this we need to annotate the text into > URIs, then import relationships into the graph. Trying to link existing > annotations w other knowledge sources. Ont is simplified version of > NIFT: documents have sections, sections have annotations that are SKOS > concepts. > > victor: So far, we've set up a pipeline to take a document and it finds > annotations with offsets. So far imported ChEBI, GO, MeSH, HPO, but > using them as controlled vocab. Many are very specific, such as > "COVID-19" -- not really NLP, because there are not inflections, > plurals, etc. Output is a bunch of triples in the simple SKOS ont > previously mentioned. Put them into GraphDB, along with the vocabs. > > victor: Also looked at SciBite annotations. They've done an excellent > job annotating. They also have their own controlled vocab that is very > good. JSON files have annotations. Put them into triples. Combining > them w bio DBs gives a graph DB. > > (victor shows relationships in GraphDB viewer) > > victor: you can navigate the hierarchy of concepts and link them to the > paragraphs in CORD-19 DB. > > (victor shows SPARQL queries) > > victor: This allows us to pull up the titles and paragraphs of articles > that both mention a kind of neoplasm and a kind of coronavirus. > > victor: Want to take other DBs and put them into GraphDB also. Monarch > Initiative is putting together KG, and also puts in SciBite. > > victor: Missing from both our effort and Monarch: searchability. I > showed SPARQL queries using broader/narrower. Also need to be more > efficient for humans, working also on faceted search. Monarch > Initiative is very good for machine readable stuff. Another thing > missing: relation extraction, from the text. How does human determine > that some text is saying that a protein interacts with another. JPL > (Lewis Magidney?sp?) is using a Stanford NLP for relation extraction. > https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph > It isn't perfect, but it indicates a relationship. Both entities are in > GO. This adds new edges between entities. Lots of interest in this > topic now. > > Franck: We're doing pretty close to this in INRIA, looking at named > entities, wikidata entities, queries that gather all articles on cancer > and any coronavirus. Another thing we're doing: in addition to > detecting named entities, we're running other tools to identify > arguments, claims, evidence in articles and draw netowrk of claims and > evidence to see what supports the claims. Hope to publish this network > soon as RDF graph. > > victor: PubAnnotation shown last week, showed epistemic analysis. > > Franck: Argument, clinical trial analysys. Query pubmed and platform > analyzes those articles. Want to apply them to CORD-19. > > Vincent: Is RDF available? victor: Will take a couple more weeks. > Vincent: Size? victor: 20GB RDF. > > David: Overlap between efforts, helpful to learn about each other's work. > > victor: After looking at Monarch initative, it isn't new, names i > recognized from Human Phenotype initative. Most of that summarizes work > that others have done. FHIR DB also have overlaps with SciBite. > > david: SPARQL query was valuable, but biologists need simple UI. > > jim: Working on faceted browser for various things, that can be reused. > Based on SPARQL fragments, property path gives certain values, here's > how to render it. Potentially useful here. Also integrated WHYIS Vega > (JS framework for charts and visualization), can plug a SPARQL query in > and get a chart. People can share how thtey're exploring the graph. > https://github.com/tetherless-world/whyis > Faceted search is a view in WHYIS, but a lot of the capabilities are > designed to use nanopub. > > Email list for these calls: > https://lists.w3.org/Archives/Public/public-semweb-lifesci/ > > Franck to present next week. > > ADJOURNED > > > On 4/27/20 4:00 PM, David Booth wrote: >> We will use this zoom: >> >> Zoom Link: >> https://us02web.zoom.us/j/89011102533?pwd=SU9CdDYxUlRtUkNBdjFUN0x4MTRxUT09 >> >> password: 82AY02Rt66 >> >> Thanks, >> David Booth >> >> On 4/27/20 12:04 PM, David Booth wrote: >>> Tomorrow (Tuesday) 11am Boston time Victor Mireles will present his >>> work on RDFizing several annotations on the Cord19 dataset that are >>> around in different vocabularies. Current vocabularies: gene >>> ontology, ChEBI, human phenotype ontology, MeSH disease. >>> >>> Details for joining the call will be posted in a follow-up message. >>> >>> Thanks, >>> David Booth >>> >>> On 4/21/20 1:32 PM, David Booth wrote: >>>> [Apologies for reaching the google hangout participant limit today, >>>> and thank you to Victor Mireles-Chavez for allowing us to switch >>>> over to his zoom instead! I will find a better solution for next >>>> week.] >>>> >>>> Below are meeting notes from today's call. If you would like to >>>> present your work on CORD-19 semantic annotations, please email me >>>> so that I can put you on the schedule. You do not need to have >>>> results yet. Even if you are just starting out, it is helpful to >>>> learn what others are doing. >>>> >>>> ---------------------------- >>>> >>>> MEETING NOTES 21-Apr-2020 >>>> >>>> Present: David Booth, Jin-Dong Kim, Víctor Mireles, Oliver >>>> Giles,Harry Hochheiser, Franck Michel, James Malone, Kyle Lo, >>>> Sebastian Kohlmeier, Guoqian Jiang, Gaurav Vaidya, Gollam Rabby, >>>> Oliver, Tomas Kliegr >>>> >>>> Introductions >>>> >>>> David Booth: Many years in semantic web technology, applying it to >>>> healthcare and life sciences for the past 10years. Involved in >>>> standardizing the RDF representation of HL7 FHIR: >>>> https://www.hl7.org/fhir/rdf.html >>>> >>>> Gaurav: U of NC (https://renci.org/staff/gaurav-vaidya/), sem web >>>> tech, using CORD-19, trying to annotate ont terms as part of Robokop >>>> (https://robokop.renci.org/). >>>> >>>> Harry: U of Pittsburgh, involved w W3C, drug-drug interaction, >>>> cancer information models, not actively using CORD-19. >>>> >>>> James Malone: CTO SciBite in UK, provide sem enrichment tooling to >>>> pharma, KG building. Background, applying ont to public data, >>>> machine learning, building ontologies. >>>> >>>> Kyle: Researcher at Allen Institute, NLP, working on CORD-19. >>>> >>>> Oliver: Machine learning at SciBite w James, NLP, machine learning. >>>> >>>> Sebastian: Prog mgr at Allen Institute, CORD-19. >>>> >>>> Tomas: Working on rule learning, trying to apply it to CORD-19. >>>> >>>> Victor Mireles: Researcher at sem web co in Austria, looking at >>>> annotations that others have been doing on CORD-19, trying to make >>>> them match, and our own annotations. >>>> >>>> Presentation by Jin-Dong Kim >>>> >>>> Slides: >>>> https://docs.google.com/presentation/d/1ynoe1Xxc_-rTiebbvvuPBQMaktK-DX87McuDVaLbI1g/edit#slide=id.g726dbf02a0_0_0 >>>> >>>> >>>> Jin-Dong: Tokyo, database center for life science, Japan gov funded, >>>> bioinformatics, NLP, text mining, esp biomedical literature. >>>> >>>> (Jin-Dong presents his slides) >>>> >>>> Jin-Dong: Using multiple datasets. Multiple groups producing >>>> annotations, isolation. PubAnnotation is a 10-year-old project to >>>> integrate annotations to literature. Collecting annotations for >>>> COVID-19 literature to integrate and release them for other use. >>>> PubAnnotation is an open repo of biomed text annotations. Anyone >>>> can submit to it. All annotations are aligned to the canonical texts. >>>> >>>> Jin-Dong: PubAnnotation also provides RESTful web services. Many >>>> annotators compatible with PubAnnotation. Also collecting manual >>>> annotations using Testae. >>>> >>>> Q: Are the annotations from a controlled vocab or ont? Jin-Dong: >>>> Both free text or from ont. >>>> >>>> Jin-Dong: Every text span has a URL. You can see what projects >>>> include a doc. And you can choose a span of text and see what >>>> projects used that span. >>>> >>>> Q: What is a project? Jin-Dong: We collect any kind of annotations. >>>> Project identifies the source of people who have contributed >>>> annotations. >>>> >>>> Jin-Dong: Annotations can be accessed via a span URL. Also >>>> converting annotationsn into RDF. Still experimenting. Also have a >>>> search interface. SPARQL queries. >>>> https://covid19.pubannotation.org/ >>>> >>>> Jin-Dong: Trying to add annotations for temporation notations. >>>> >>>> Jin-Dong: Literature includes CORD-19 and LitCovid, from NCBI. >>>> Uploaded all the test to PubAnnotation >>>> (http://pubannotation.org/collections/LitCovid) Anyone can >>>> contribute. To contribute, you can download, annotation, then create >>>> a new project and add it to the LitCovid collection and it will >>>> appear. Open platform. Same setup for CORD-19. Received 6 >>>> contributions so far. Need to analyze them. Planning to call for >>>> wider contributions soon, maybe next week. Plan to continuously >>>> update. >>>> >>>> Guoqian: Any specific research questions using these annotations? >>>> Particular use cases? Jin-Dong: Need to find out. Clinicians began >>>> with manual annotations. Will figure out missing parts and try to >>>> fill the gaps. Many annotations are concept annotations using ont >>>> -- many similar. But we think there are still important missing >>>> annotations, such as temporal expressions. Looking to add those. >>>> Also quantitative traits annotations are missing. Looking for those >>>> too. >>>> >>>> Q: How might these be used? >>>> >>>> Franck: I'm in Inria/CNRS/Univ Côte d'Azur, contacts with Inserm >>>> (French NIH) point to the need to search literature with questions >>>> like: "What are the papers that link Coronavirus with other diseases >>>> like diabetes or cancer?" >>>> >>>> James: Released COVID-specific annotations. Pharma using them: >>>> looking for co-risk factors, or drugs interacting. Comes down to: >>>> want to narrow down to a set of papers to read. Anything that gets >>>> them to the paper. Want to read the o >>>> >>>> Franck: Summarizing the main claim of the paper helps also, to >>>> narrow down the search. >>>> >>>> Victor: Drug-drug interactions. Many other KGs, to link to >>>> drug-drug or protein-protein interaction databases we need URIs, so >>>> pubAnnotations can query and get URIs from it, so I can see what >>>> drugs are mentioned in this span. Is this supported? >>>> >>>> Jin-Dong: Group in China is working on annotations for drug >>>> repurposing. I think they're using drug ont. >>>> >>>> Franck: How can we consume the annotations that have been >>>> contributed? Jin-Dong: Download in JSON or CSV, or access as RDF. >>>> >>>> Tomas: We detect entities, then try to do semantic extension. Would >>>> there be a way to use this for semantic extension of entities, or >>>> get a list of highly specific concepts that appear in the article. >>>> Jin-Dong: Yes, because they're in RDF, could do that. Tomas: How to >>>> match doc in your DB with doc in other DB? Jin-Dong: Every doc is >>>> identified by a pair: DB identifier, and ID within that DB. >>>> >>>> Tomas: How many annotations average per document? Jin-Dong: >>>> Conversion is not entirely done. RDF statements only partially >>>> done. Jin-Dong: in CORD-PICO, for 26k docs, 69k annotations for PICO. >>>> >>>> ADJOURNED >>>> >>>> ----------------------------------------------------------------------- >>>> >>>> On 4/21/20 10:47 AM, David Booth wrote: >>>>> Last minute schedule change for today's call: Instead of Scott >>>>> Malec, Jin-Dong Kim will present his work on "An open collaboration >>>>> for richly annotating Covid-19 Literature". Slides are here: >>>>> https://docs.google.com/presentation/d/1ynoe1Xxc_-rTiebbvvuPBQMaktK-DX87McuDVaLbI1g/edit#slide=id.g726dbf02a0_0_0 >>>>> >>>>> >>>>> David Booth >>>>> >>>>> On 4/20/20 11:56 AM, David Booth wrote: >>>>>> Tomorrow (Tuesday) 11am Boston time Scott Malec will discuss his >>>>>> work on computable knowledge extraction using the CORD-19 dataset >>>>>> that was released by the Allen Institute. >>>>>> >>>>>> We will use this google hangout: >>>>>> http://tinyurl.com/fhirrdf >>>>>> >>>>>> More on Scott's work: >>>>>> https://github.com/fhircat/CORD-19-on-FHIR/wiki/CORD-19-Semantic-Annotation-Projects#project-name-cord-semantictriples >>>>>> >>>>>> >>>>>> We still have time for one other presentation tomorrow about >>>>>> CORD-19 semantic annotation. If anyone else is ready (with >>>>>> slides) to present for 20 minutes, please let me know. >>>>>> >>>>>> Thanks, >>>>>> David Booth >>>>>> >>>>>> ----------------------------------------------- >>>>>> >>>>>> MEETING NOTES 7-Apr-2020 >>>>>> Present: David Booth <david@dbooth.org>, Sebastian Kohlmeier >>>>>> <sebastiank@allenai.org>, Lucy Lu Wang <lucyw@allenai.org>, Kyle >>>>>> Lo <kylel@allenai.org>, Jim McCusker <mccusker@gmail.com>, Scott >>>>>> Malec <sam413@pitt.edu>, Guoqian Jiang <jiang.guoqian@mayo.edu>, >>>>>> Todor Primov <todor.primov@ontotext.com> >>>>>> >>>>>> Sebastian: Allen Institute, Semantic Scholar, Non-profit AI >>>>>> institute, w Lucy and Kyle. Engaged in COVID-19 because as >>>>>> non-profit could develop a corpus that we can make available. >>>>>> Created CORD-19 dataset. Goal: Standardized format that's easy >>>>>> for machines to read, to enable quick analysys of the literature. >>>>>> Working to extend it. Weekly updates, but want to get to daily >>>>>> updates. Want to also get to to entity and relation extraction. >>>>>> >>>>>> Guoqian: Identifiers used? SHA numbers for full text, but also >>>>>> IDs linked to DOIs and Pubmed IDs. Should discuss best way to >>>>>> have unique ID for publication. >>>>>> >>>>>> Kyle: Added unique IDs: cord_UID. SHA is a hash of PDF, and >>>>>> sometimes there are multiple PDFs for a single paper. >>>>>> >>>>>> Jim: DOIs? >>>>>> >>>>>> Lucy: Some papers do not have a DOI. >>>>>> >>>>>> Jim: Building a KG using generalized tools from another projects, >>>>>> used in many domains. Looking to do drug repurposing using >>>>>> CORD-19. Using an extract of CORD-19. Does deep extraction of >>>>>> named entities and relationships. Use PROV ont and >>>>>> nanopublications, for rich modeling and provenance for >>>>>> probabilistic KG. Arcs in picture are based on confidence level. >>>>>> Allows high precision on drugs that have been tested on melanoma >>>>>> before. Re-applying this to COVID-19. We focus on open >>>>>> ontologies, and not using FHIR. Unpublished yet. Page-rank based >>>>>> analysis of pubmed citation graph, to compute community trust in a >>>>>> paper. >>>>>> >>>>>> Guoqian: What ont? >>>>>> >>>>>> Jim: Drugbank mostly. Lots of targets. >>>>>> >>>>>> Kyle: Relation-entity set. Closed set? >>>>>> >>>>>> Jim: We have drug graph, protein-protein interaction, and drugbank >>>>>> has drug-protein interaction. Molecular interaction. CTD >>>>>> Comparative Toxinomic Database, Heng Ji Lab database started with it. >>>>>> >>>>>> Kyle: Trying to add more KB entities? >>>>>> >>>>>> Jim: Want to expand the interaction set. Also entities. We have >>>>>> all human proteins and drugbank drugs. If you have a drug with an >>>>>> effect on a target similar protein in COVID-19, will there be >>>>>> hits, directly or indirectly? To do that, we want to score it >>>>>> also based on confidence in the research. >>>>>> >>>>>> Scott: My research approach is to integrate structured knowledge >>>>>> from literature or other curated sources, and combine with >>>>>> observational data to facilitate more reliable inference. General >>>>>> idea is that contextual info can help interpret and identify >>>>>> confounders. Confounders are common causes of the predictor and >>>>>> outcome. What I did with CORD-19: took pubmed IDs, and found what >>>>>> machine reading performed and created KG. Machine reading can run >>>>>> for months. Jim's work on citation analysis is cool. Using >>>>>> semrep, developed by NLM, over titles and abstracts in pubmed. >>>>>> Using Pubmed central IDs from metadata table, in beginning of >>>>>> March, 31k papers, with 28k in pubmed central. Seemed like a good >>>>>> place to start building a KG quickly, to see the big picture >>>>>> quickly. Pulled 106k semantic predications in the 21k docs, >>>>>> pulled into cytoscape and computed network centrality, and from >>>>>> that ranked. Fitered by biomedicl entities, diseases, syndromes, >>>>>> amino acids, peptides or pharm substances. Ranked themm by >>>>>> centrality to understnad their importance. Very prelim analysis. >>>>>> Interested to see how I might expand on this and learn what others >>>>>> are doing. >>>>>> >>>>>> Guoqian: Can cytoscape support RDF graphs? David: Yes. Jim: Yes, >>>>>> and you can form SPARQL queries to extract specific interactions. >>>>>> Not 1:1 mapping of RDF graph to bio network. >>>>>> >>>>>> Todor: There are different plugins, one is SPARQL endpoint. >>>>>> Others for other visualizations. Keep expectations low. >>>>>> >>>>>> Jim: It also includes a knowledge exploration interface, built on >>>>>> cytoscape.js, a re-implementation of cytoscape. The >>>>>> implementation I'm using has some interface element. >>>>>> >>>>>> Lucy: How does Coronavirus ont relate? >>>>>> >>>>>> Guoqian: Using this ont to annotate the papers. >>>>>> >>>>>> Lucy: Where did that ont come from? >>>>>> >>>>>> Jim: Built using OBO foundries? Guoqian: Yes. >>>>>> >>>>>> Jim: We use OBO ont. Oliver has a lot of tools for subsetting and >>>>>> extracting for app ontologies. >>>>>> >>>>>> Guoqian: Also collaborating with Cochrane PICO ontology, devloping >>>>>> COVID-19 PICO ont, specific subtypes of the high level types, eg, >>>>>> subtypes of population with particular co-morbilitidies. The ont >>>>>> is also avail on github. >>>>>> >>>>>> Guoqian: How to collaborate? Need a registry for KG from this >>>>>> community? >>>>>> >>>>>> Lucy: Working on semantic annotation of entity and rel. Lots of >>>>>> people are doing bottom-up annotation, without formal vocab, then >>>>>> linking to UMLS. But haven't seen COVID-19 ont. >>>>>> >>>>>> Guoqian: Also should look at use cases that different groups have. >>>>>> Community said they want open vocab instead of SNOMED-CT, such as >>>>>> UMLS. >>>>>> >>>>>> Lucy: Also working with a group at AWS, KB of concepts, link to >>>>>> ICD-10 and RXNorm, also lots of requests for protein and >>>>>> interactions. >>>>>> >>>>>> Guoqian: Also procedure datasets. >>>>>> >>>>>> Lucy: What use cases are these projects addressing? >>>>>> >>>>>> Guoqian: For EBMonFHIR, they are focused on review of evidence, >>>>>> and clinical concepts. Other team looking at using OBO ont to >>>>>> analyse DB to mine underlying mechanisms. Ideally we should have >>>>>> linkage across vocabularies. Eg UMLS can link many ont. But for >>>>>> OBO it might be a challenge. >>>>>> >>>>>> Jim: From microbio perspectvie, most useful from this group would >>>>>> be having cross mapping from clinical/FHIR/SNOMED-ish world and >>>>>> OBO bio world, with translation between the two. E.g. I use >>>>>> uniprot IDs. Is that a problem? What about drug IDs? IDs are the >>>>>> hardest part -- agree on some, and mappings for others. >>>>>> >>>>>> Guoqian: If we can provide a list of ont each team prefers, we can >>>>>> discuss. >>>>>> >>>>>> Lucy: Would be great to be able to share annotations. Centralized >>>>>> vocab? Central KB? Use cases are key. >>>>>> >>>>>> Scott: Mapping problems with COVID-19 are same as other mapping >>>>>> problems. Should have a central place to share projects. Should >>>>>> keep use cases in mind. >>>>>> >>>>>> Sebastian: Please give us feedback on the dataset! >>>>>> >>>>>> Todor: Focus on specific questions that you want to answer, then >>>>>> map using common IDs to address them. >>>>>> >>>>>> Daniel: What formats? Right now we're using FHIR. Use others? >>>>>> >>>>>> Jim: identifier.org might be useful for mapping. >>>>>> >>>>>> David: Useful to have each group present use cases and vocab. >>>>>> >>>>>> We'll meet weekly, same time, 1 hour. Each group will present >>>>>> their work in more detail, with focus on: >>>>>> what use cases they are addressing; and >>>>>> what vocabularies / ontologies they're using. >>>>>> >>>>>> Each group will present for 20 min presents, 10 min questions. >>>>>> >>>>>> ADJOURNED
Received on Tuesday, 5 May 2020 15:07:34 UTC