- From: David Booth <david@dbooth.org>
- Date: Mon, 27 Apr 2020 16:00:56 -0400
- To: w3c semweb HCLS <public-semweb-lifesci@w3.org>
- Cc: Victor Mireles <victor.mireles@semantic-web.com>
We will use this zoom: Zoom Link: https://us02web.zoom.us/j/89011102533?pwd=SU9CdDYxUlRtUkNBdjFUN0x4MTRxUT09 password: 82AY02Rt66 Thanks, David Booth On 4/27/20 12:04 PM, David Booth wrote: > Tomorrow (Tuesday) 11am Boston time Victor Mireles will present his work > on RDFizing several annotations on the Cord19 dataset that are around in > different vocabularies. Current vocabularies: gene ontology, ChEBI, > human phenotype ontology, MeSH disease. > > Details for joining the call will be posted in a follow-up message. > > Thanks, > David Booth > > On 4/21/20 1:32 PM, David Booth wrote: >> [Apologies for reaching the google hangout participant limit today, >> and thank you to Victor Mireles-Chavez for allowing us to switch over >> to his zoom instead! I will find a better solution for next week.] >> >> Below are meeting notes from today's call. If you would like to >> present your work on CORD-19 semantic annotations, please email me so >> that I can put you on the schedule. You do not need to have results >> yet. Even if you are just starting out, it is helpful to learn what >> others are doing. >> >> ---------------------------- >> >> MEETING NOTES 21-Apr-2020 >> >> Present: David Booth, Jin-Dong Kim, Víctor Mireles, Oliver Giles,Harry >> Hochheiser, Franck Michel, James Malone, Kyle Lo, Sebastian Kohlmeier, >> Guoqian Jiang, Gaurav Vaidya, Gollam Rabby, Oliver, Tomas Kliegr >> >> Introductions >> >> David Booth: Many years in semantic web technology, applying it to >> healthcare and life sciences for the past 10years. Involved in >> standardizing the RDF representation of HL7 FHIR: >> https://www.hl7.org/fhir/rdf.html >> >> Gaurav: U of NC (https://renci.org/staff/gaurav-vaidya/), sem web >> tech, using CORD-19, trying to annotate ont terms as part of Robokop >> (https://robokop.renci.org/). >> >> Harry: U of Pittsburgh, involved w W3C, drug-drug interaction, cancer >> information models, not actively using CORD-19. >> >> James Malone: CTO SciBite in UK, provide sem enrichment tooling to >> pharma, KG building. Background, applying ont to public data, machine >> learning, building ontologies. >> >> Kyle: Researcher at Allen Institute, NLP, working on CORD-19. >> >> Oliver: Machine learning at SciBite w James, NLP, machine learning. >> >> Sebastian: Prog mgr at Allen Institute, CORD-19. >> >> Tomas: Working on rule learning, trying to apply it to CORD-19. >> >> Victor Mireles: Researcher at sem web co in Austria, looking at >> annotations that others have been doing on CORD-19, trying to make >> them match, and our own annotations. >> >> Presentation by Jin-Dong Kim >> >> Slides: >> https://docs.google.com/presentation/d/1ynoe1Xxc_-rTiebbvvuPBQMaktK-DX87McuDVaLbI1g/edit#slide=id.g726dbf02a0_0_0 >> >> >> Jin-Dong: Tokyo, database center for life science, Japan gov funded, >> bioinformatics, NLP, text mining, esp biomedical literature. >> >> (Jin-Dong presents his slides) >> >> Jin-Dong: Using multiple datasets. Multiple groups producing >> annotations, isolation. PubAnnotation is a 10-year-old project to >> integrate annotations to literature. Collecting annotations for >> COVID-19 literature to integrate and release them for other use. >> PubAnnotation is an open repo of biomed text annotations. Anyone can >> submit to it. All annotations are aligned to the canonical texts. >> >> Jin-Dong: PubAnnotation also provides RESTful web services. Many >> annotators compatible with PubAnnotation. Also collecting manual >> annotations using Testae. >> >> Q: Are the annotations from a controlled vocab or ont? Jin-Dong: Both >> free text or from ont. >> >> Jin-Dong: Every text span has a URL. You can see what projects >> include a doc. And you can choose a span of text and see what >> projects used that span. >> >> Q: What is a project? Jin-Dong: We collect any kind of annotations. >> Project identifies the source of people who have contributed annotations. >> >> Jin-Dong: Annotations can be accessed via a span URL. Also converting >> annotationsn into RDF. Still experimenting. Also have a search >> interface. SPARQL queries. >> https://covid19.pubannotation.org/ >> >> Jin-Dong: Trying to add annotations for temporation notations. >> >> Jin-Dong: Literature includes CORD-19 and LitCovid, from NCBI. >> Uploaded all the test to PubAnnotation >> (http://pubannotation.org/collections/LitCovid) Anyone can >> contribute. To contribute, you can download, annotation, then create a >> new project and add it to the LitCovid collection and it will appear. >> Open platform. Same setup for CORD-19. Received 6 contributions so >> far. Need to analyze them. Planning to call for wider contributions >> soon, maybe next week. Plan to continuously update. >> >> Guoqian: Any specific research questions using these annotations? >> Particular use cases? Jin-Dong: Need to find out. Clinicians began >> with manual annotations. Will figure out missing parts and try to >> fill the gaps. Many annotations are concept annotations using ont -- >> many similar. But we think there are still important missing >> annotations, such as temporal expressions. Looking to add those. >> Also quantitative traits annotations are missing. Looking for those too. >> >> Q: How might these be used? >> >> Franck: I'm in Inria/CNRS/Univ Côte d'Azur, contacts with Inserm >> (French NIH) point to the need to search literature with questions >> like: "What are the papers that link Coronavirus with other diseases >> like diabetes or cancer?" >> >> James: Released COVID-specific annotations. Pharma using them: looking >> for co-risk factors, or drugs interacting. Comes down to: want to >> narrow down to a set of papers to read. Anything that gets them to >> the paper. Want to read the o >> >> Franck: Summarizing the main claim of the paper helps also, to narrow >> down the search. >> >> Victor: Drug-drug interactions. Many other KGs, to link to drug-drug >> or protein-protein interaction databases we need URIs, so >> pubAnnotations can query and get URIs from it, so I can see what drugs >> are mentioned in this span. Is this supported? >> >> Jin-Dong: Group in China is working on annotations for drug >> repurposing. I think they're using drug ont. >> >> Franck: How can we consume the annotations that have been contributed? >> Jin-Dong: Download in JSON or CSV, or access as RDF. >> >> Tomas: We detect entities, then try to do semantic extension. Would >> there be a way to use this for semantic extension of entities, or get >> a list of highly specific concepts that appear in the article. >> Jin-Dong: Yes, because they're in RDF, could do that. Tomas: How to >> match doc in your DB with doc in other DB? Jin-Dong: Every doc is >> identified by a pair: DB identifier, and ID within that DB. >> >> Tomas: How many annotations average per document? Jin-Dong: >> Conversion is not entirely done. RDF statements only partially done. >> Jin-Dong: in CORD-PICO, for 26k docs, 69k annotations for PICO. >> >> ADJOURNED >> >> ----------------------------------------------------------------------- >> >> On 4/21/20 10:47 AM, David Booth wrote: >>> Last minute schedule change for today's call: Instead of Scott Malec, >>> Jin-Dong Kim will present his work on "An open collaboration for >>> richly annotating Covid-19 Literature". Slides are here: >>> https://docs.google.com/presentation/d/1ynoe1Xxc_-rTiebbvvuPBQMaktK-DX87McuDVaLbI1g/edit#slide=id.g726dbf02a0_0_0 >>> >>> >>> David Booth >>> >>> On 4/20/20 11:56 AM, David Booth wrote: >>>> Tomorrow (Tuesday) 11am Boston time Scott Malec will discuss his >>>> work on computable knowledge extraction using the CORD-19 dataset >>>> that was released by the Allen Institute. >>>> >>>> We will use this google hangout: >>>> http://tinyurl.com/fhirrdf >>>> >>>> More on Scott's work: >>>> https://github.com/fhircat/CORD-19-on-FHIR/wiki/CORD-19-Semantic-Annotation-Projects#project-name-cord-semantictriples >>>> >>>> >>>> We still have time for one other presentation tomorrow about CORD-19 >>>> semantic annotation. If anyone else is ready (with slides) to >>>> present for 20 minutes, please let me know. >>>> >>>> Thanks, >>>> David Booth >>>> >>>> ----------------------------------------------- >>>> >>>> MEETING NOTES 7-Apr-2020 >>>> Present: David Booth <david@dbooth.org>, Sebastian Kohlmeier >>>> <sebastiank@allenai.org>, Lucy Lu Wang <lucyw@allenai.org>, Kyle Lo >>>> <kylel@allenai.org>, Jim McCusker <mccusker@gmail.com>, Scott Malec >>>> <sam413@pitt.edu>, Guoqian Jiang <jiang.guoqian@mayo.edu>, Todor >>>> Primov <todor.primov@ontotext.com> >>>> >>>> Sebastian: Allen Institute, Semantic Scholar, Non-profit AI >>>> institute, w Lucy and Kyle. Engaged in COVID-19 because as >>>> non-profit could develop a corpus that we can make available. >>>> Created CORD-19 dataset. Goal: Standardized format that's easy for >>>> machines to read, to enable quick analysys of the literature. >>>> Working to extend it. Weekly updates, but want to get to daily >>>> updates. Want to also get to to entity and relation extraction. >>>> >>>> Guoqian: Identifiers used? SHA numbers for full text, but also IDs >>>> linked to DOIs and Pubmed IDs. Should discuss best way to have >>>> unique ID for publication. >>>> >>>> Kyle: Added unique IDs: cord_UID. SHA is a hash of PDF, and >>>> sometimes there are multiple PDFs for a single paper. >>>> >>>> Jim: DOIs? >>>> >>>> Lucy: Some papers do not have a DOI. >>>> >>>> Jim: Building a KG using generalized tools from another projects, >>>> used in many domains. Looking to do drug repurposing using CORD-19. >>>> Using an extract of CORD-19. Does deep extraction of named entities >>>> and relationships. Use PROV ont and nanopublications, for rich >>>> modeling and provenance for probabilistic KG. Arcs in picture are >>>> based on confidence level. Allows high precision on drugs that have >>>> been tested on melanoma before. Re-applying this to COVID-19. We >>>> focus on open ontologies, and not using FHIR. Unpublished yet. >>>> Page-rank based analysis of pubmed citation graph, to compute >>>> community trust in a paper. >>>> >>>> Guoqian: What ont? >>>> >>>> Jim: Drugbank mostly. Lots of targets. >>>> >>>> Kyle: Relation-entity set. Closed set? >>>> >>>> Jim: We have drug graph, protein-protein interaction, and drugbank >>>> has drug-protein interaction. Molecular interaction. CTD >>>> Comparative Toxinomic Database, Heng Ji Lab database started with it. >>>> >>>> Kyle: Trying to add more KB entities? >>>> >>>> Jim: Want to expand the interaction set. Also entities. We have >>>> all human proteins and drugbank drugs. If you have a drug with an >>>> effect on a target similar protein in COVID-19, will there be hits, >>>> directly or indirectly? To do that, we want to score it also based >>>> on confidence in the research. >>>> >>>> Scott: My research approach is to integrate structured knowledge >>>> from literature or other curated sources, and combine with >>>> observational data to facilitate more reliable inference. General >>>> idea is that contextual info can help interpret and identify >>>> confounders. Confounders are common causes of the predictor and >>>> outcome. What I did with CORD-19: took pubmed IDs, and found what >>>> machine reading performed and created KG. Machine reading can run >>>> for months. Jim's work on citation analysis is cool. Using semrep, >>>> developed by NLM, over titles and abstracts in pubmed. Using Pubmed >>>> central IDs from metadata table, in beginning of March, 31k papers, >>>> with 28k in pubmed central. Seemed like a good place to start >>>> building a KG quickly, to see the big picture quickly. Pulled 106k >>>> semantic predications in the 21k docs, pulled into cytoscape and >>>> computed network centrality, and from that ranked. Fitered by >>>> biomedicl entities, diseases, syndromes, amino acids, peptides or >>>> pharm substances. Ranked themm by centrality to understnad their >>>> importance. Very prelim analysis. Interested to see how I might >>>> expand on this and learn what others are doing. >>>> >>>> Guoqian: Can cytoscape support RDF graphs? David: Yes. Jim: Yes, >>>> and you can form SPARQL queries to extract specific interactions. >>>> Not 1:1 mapping of RDF graph to bio network. >>>> >>>> Todor: There are different plugins, one is SPARQL endpoint. Others >>>> for other visualizations. Keep expectations low. >>>> >>>> Jim: It also includes a knowledge exploration interface, built on >>>> cytoscape.js, a re-implementation of cytoscape. The implementation >>>> I'm using has some interface element. >>>> >>>> Lucy: How does Coronavirus ont relate? >>>> >>>> Guoqian: Using this ont to annotate the papers. >>>> >>>> Lucy: Where did that ont come from? >>>> >>>> Jim: Built using OBO foundries? Guoqian: Yes. >>>> >>>> Jim: We use OBO ont. Oliver has a lot of tools for subsetting and >>>> extracting for app ontologies. >>>> >>>> Guoqian: Also collaborating with Cochrane PICO ontology, devloping >>>> COVID-19 PICO ont, specific subtypes of the high level types, eg, >>>> subtypes of population with particular co-morbilitidies. The ont is >>>> also avail on github. >>>> >>>> Guoqian: How to collaborate? Need a registry for KG from this >>>> community? >>>> >>>> Lucy: Working on semantic annotation of entity and rel. Lots of >>>> people are doing bottom-up annotation, without formal vocab, then >>>> linking to UMLS. But haven't seen COVID-19 ont. >>>> >>>> Guoqian: Also should look at use cases that different groups have. >>>> Community said they want open vocab instead of SNOMED-CT, such as UMLS. >>>> >>>> Lucy: Also working with a group at AWS, KB of concepts, link to >>>> ICD-10 and RXNorm, also lots of requests for protein and interactions. >>>> >>>> Guoqian: Also procedure datasets. >>>> >>>> Lucy: What use cases are these projects addressing? >>>> >>>> Guoqian: For EBMonFHIR, they are focused on review of evidence, and >>>> clinical concepts. Other team looking at using OBO ont to analyse >>>> DB to mine underlying mechanisms. Ideally we should have linkage >>>> across vocabularies. Eg UMLS can link many ont. But for OBO it >>>> might be a challenge. >>>> >>>> Jim: From microbio perspectvie, most useful from this group would be >>>> having cross mapping from clinical/FHIR/SNOMED-ish world and OBO bio >>>> world, with translation between the two. E.g. I use uniprot IDs. >>>> Is that a problem? What about drug IDs? IDs are the hardest part >>>> -- agree on some, and mappings for others. >>>> >>>> Guoqian: If we can provide a list of ont each team prefers, we can >>>> discuss. >>>> >>>> Lucy: Would be great to be able to share annotations. Centralized >>>> vocab? Central KB? Use cases are key. >>>> >>>> Scott: Mapping problems with COVID-19 are same as other mapping >>>> problems. Should have a central place to share projects. Should >>>> keep use cases in mind. >>>> >>>> Sebastian: Please give us feedback on the dataset! >>>> >>>> Todor: Focus on specific questions that you want to answer, then map >>>> using common IDs to address them. >>>> >>>> Daniel: What formats? Right now we're using FHIR. Use others? >>>> >>>> Jim: identifier.org might be useful for mapping. >>>> >>>> David: Useful to have each group present use cases and vocab. >>>> >>>> We'll meet weekly, same time, 1 hour. Each group will present their >>>> work in more detail, with focus on: >>>> what use cases they are addressing; and >>>> what vocabularies / ontologies they're using. >>>> >>>> Each group will present for 20 min presents, 10 min questions. >>>> >>>> ADJOURNED
Received on Monday, 27 April 2020 20:01:12 UTC