- From: David Booth <david@dbooth.org>
- Date: Mon, 27 Apr 2020 12:04:59 -0400
- To: w3c semweb HCLS <public-semweb-lifesci@w3.org>
- Cc: Victor Mireles <victor.mireles@semantic-web.com>
Tomorrow (Tuesday) 11am Boston time Victor Mireles will present his work on RDFizing several annotations on the Cord19 dataset that are around in different vocabularies. Current vocabularies: gene ontology, ChEBI, human phenotype ontology, MeSH disease. Details for joining the call will be posted in a follow-up message. Thanks, David Booth On 4/21/20 1:32 PM, David Booth wrote: > [Apologies for reaching the google hangout participant limit today, and > thank you to Victor Mireles-Chavez for allowing us to switch over to his > zoom instead! I will find a better solution for next week.] > > Below are meeting notes from today's call. If you would like to present > your work on CORD-19 semantic annotations, please email me so that I can > put you on the schedule. You do not need to have results yet. Even if > you are just starting out, it is helpful to learn what others are doing. > > ---------------------------- > > MEETING NOTES 21-Apr-2020 > > Present: David Booth, Jin-Dong Kim, Víctor Mireles, Oliver Giles,Harry > Hochheiser, Franck Michel, James Malone, Kyle Lo, Sebastian Kohlmeier, > Guoqian Jiang, Gaurav Vaidya, Gollam Rabby, Oliver, Tomas Kliegr > > Introductions > > David Booth: Many years in semantic web technology, applying it to > healthcare and life sciences for the past 10years. Involved in > standardizing the RDF representation of HL7 FHIR: > https://www.hl7.org/fhir/rdf.html > > Gaurav: U of NC (https://renci.org/staff/gaurav-vaidya/), sem web tech, > using CORD-19, trying to annotate ont terms as part of Robokop > (https://robokop.renci.org/). > > Harry: U of Pittsburgh, involved w W3C, drug-drug interaction, cancer > information models, not actively using CORD-19. > > James Malone: CTO SciBite in UK, provide sem enrichment tooling to > pharma, KG building. Background, applying ont to public data, machine > learning, building ontologies. > > Kyle: Researcher at Allen Institute, NLP, working on CORD-19. > > Oliver: Machine learning at SciBite w James, NLP, machine learning. > > Sebastian: Prog mgr at Allen Institute, CORD-19. > > Tomas: Working on rule learning, trying to apply it to CORD-19. > > Victor Mireles: Researcher at sem web co in Austria, looking at > annotations that others have been doing on CORD-19, trying to make them > match, and our own annotations. > > Presentation by Jin-Dong Kim > > Slides: > https://docs.google.com/presentation/d/1ynoe1Xxc_-rTiebbvvuPBQMaktK-DX87McuDVaLbI1g/edit#slide=id.g726dbf02a0_0_0 > > > Jin-Dong: Tokyo, database center for life science, Japan gov funded, > bioinformatics, NLP, text mining, esp biomedical literature. > > (Jin-Dong presents his slides) > > Jin-Dong: Using multiple datasets. Multiple groups producing > annotations, isolation. PubAnnotation is a 10-year-old project to > integrate annotations to literature. Collecting annotations for > COVID-19 literature to integrate and release them for other use. > PubAnnotation is an open repo of biomed text annotations. Anyone can > submit to it. All annotations are aligned to the canonical texts. > > Jin-Dong: PubAnnotation also provides RESTful web services. Many > annotators compatible with PubAnnotation. Also collecting manual > annotations using Testae. > > Q: Are the annotations from a controlled vocab or ont? Jin-Dong: Both > free text or from ont. > > Jin-Dong: Every text span has a URL. You can see what projects include > a doc. And you can choose a span of text and see what projects used > that span. > > Q: What is a project? Jin-Dong: We collect any kind of annotations. > Project identifies the source of people who have contributed annotations. > > Jin-Dong: Annotations can be accessed via a span URL. Also converting > annotationsn into RDF. Still experimenting. Also have a search > interface. SPARQL queries. > https://covid19.pubannotation.org/ > > Jin-Dong: Trying to add annotations for temporation notations. > > Jin-Dong: Literature includes CORD-19 and LitCovid, from NCBI. Uploaded > all the test to PubAnnotation > (http://pubannotation.org/collections/LitCovid) Anyone can contribute. > To contribute, you can download, annotation, then create a new project > and add it to the LitCovid collection and it will appear. Open > platform. Same setup for CORD-19. Received 6 contributions so far. > Need to analyze them. Planning to call for wider contributions soon, > maybe next week. Plan to continuously update. > > Guoqian: Any specific research questions using these annotations? > Particular use cases? Jin-Dong: Need to find out. Clinicians began with > manual annotations. Will figure out missing parts and try to fill the > gaps. Many annotations are concept annotations using ont -- many > similar. But we think there are still important missing annotations, > such as temporal expressions. Looking to add those. Also quantitative > traits annotations are missing. Looking for those too. > > Q: How might these be used? > > Franck: I'm in Inria/CNRS/Univ Côte d'Azur, contacts with Inserm (French > NIH) point to the need to search literature with questions like: "What > are the papers that link Coronavirus with other diseases like diabetes > or cancer?" > > James: Released COVID-specific annotations. Pharma using them: looking > for co-risk factors, or drugs interacting. Comes down to: want to > narrow down to a set of papers to read. Anything that gets them to the > paper. Want to read the o > > Franck: Summarizing the main claim of the paper helps also, to narrow > down the search. > > Victor: Drug-drug interactions. Many other KGs, to link to drug-drug or > protein-protein interaction databases we need URIs, so pubAnnotations > can query and get URIs from it, so I can see what drugs are mentioned in > this span. Is this supported? > > Jin-Dong: Group in China is working on annotations for drug repurposing. > I think they're using drug ont. > > Franck: How can we consume the annotations that have been contributed? > Jin-Dong: Download in JSON or CSV, or access as RDF. > > Tomas: We detect entities, then try to do semantic extension. Would > there be a way to use this for semantic extension of entities, or get a > list of highly specific concepts that appear in the article. Jin-Dong: > Yes, because they're in RDF, could do that. Tomas: How to match doc in > your DB with doc in other DB? Jin-Dong: Every doc is identified by a > pair: DB identifier, and ID within that DB. > > Tomas: How many annotations average per document? Jin-Dong: Conversion > is not entirely done. RDF statements only partially done. Jin-Dong: in > CORD-PICO, for 26k docs, 69k annotations for PICO. > > ADJOURNED > > ----------------------------------------------------------------------- > > On 4/21/20 10:47 AM, David Booth wrote: >> Last minute schedule change for today's call: Instead of Scott Malec, >> Jin-Dong Kim will present his work on "An open collaboration for >> richly annotating Covid-19 Literature". Slides are here: >> https://docs.google.com/presentation/d/1ynoe1Xxc_-rTiebbvvuPBQMaktK-DX87McuDVaLbI1g/edit#slide=id.g726dbf02a0_0_0 >> >> >> David Booth >> >> On 4/20/20 11:56 AM, David Booth wrote: >>> Tomorrow (Tuesday) 11am Boston time Scott Malec will discuss his work >>> on computable knowledge extraction using the CORD-19 dataset that was >>> released by the Allen Institute. >>> >>> We will use this google hangout: >>> http://tinyurl.com/fhirrdf >>> >>> More on Scott's work: >>> https://github.com/fhircat/CORD-19-on-FHIR/wiki/CORD-19-Semantic-Annotation-Projects#project-name-cord-semantictriples >>> >>> >>> We still have time for one other presentation tomorrow about CORD-19 >>> semantic annotation. If anyone else is ready (with slides) to >>> present for 20 minutes, please let me know. >>> >>> Thanks, >>> David Booth >>> >>> ----------------------------------------------- >>> >>> MEETING NOTES 7-Apr-2020 >>> Present: David Booth <david@dbooth.org>, Sebastian Kohlmeier >>> <sebastiank@allenai.org>, Lucy Lu Wang <lucyw@allenai.org>, Kyle Lo >>> <kylel@allenai.org>, Jim McCusker <mccusker@gmail.com>, Scott Malec >>> <sam413@pitt.edu>, Guoqian Jiang <jiang.guoqian@mayo.edu>, Todor >>> Primov <todor.primov@ontotext.com> >>> >>> Sebastian: Allen Institute, Semantic Scholar, Non-profit AI >>> institute, w Lucy and Kyle. Engaged in COVID-19 because as >>> non-profit could develop a corpus that we can make available. >>> Created CORD-19 dataset. Goal: Standardized format that's easy for >>> machines to read, to enable quick analysys of the literature. >>> Working to extend it. Weekly updates, but want to get to daily >>> updates. Want to also get to to entity and relation extraction. >>> >>> Guoqian: Identifiers used? SHA numbers for full text, but also IDs >>> linked to DOIs and Pubmed IDs. Should discuss best way to have >>> unique ID for publication. >>> >>> Kyle: Added unique IDs: cord_UID. SHA is a hash of PDF, and >>> sometimes there are multiple PDFs for a single paper. >>> >>> Jim: DOIs? >>> >>> Lucy: Some papers do not have a DOI. >>> >>> Jim: Building a KG using generalized tools from another projects, >>> used in many domains. Looking to do drug repurposing using CORD-19. >>> Using an extract of CORD-19. Does deep extraction of named entities >>> and relationships. Use PROV ont and nanopublications, for rich >>> modeling and provenance for probabilistic KG. Arcs in picture are >>> based on confidence level. Allows high precision on drugs that have >>> been tested on melanoma before. Re-applying this to COVID-19. We >>> focus on open ontologies, and not using FHIR. Unpublished yet. >>> Page-rank based analysis of pubmed citation graph, to compute >>> community trust in a paper. >>> >>> Guoqian: What ont? >>> >>> Jim: Drugbank mostly. Lots of targets. >>> >>> Kyle: Relation-entity set. Closed set? >>> >>> Jim: We have drug graph, protein-protein interaction, and drugbank >>> has drug-protein interaction. Molecular interaction. CTD >>> Comparative Toxinomic Database, Heng Ji Lab database started with it. >>> >>> Kyle: Trying to add more KB entities? >>> >>> Jim: Want to expand the interaction set. Also entities. We have all >>> human proteins and drugbank drugs. If you have a drug with an effect >>> on a target similar protein in COVID-19, will there be hits, directly >>> or indirectly? To do that, we want to score it also based on >>> confidence in the research. >>> >>> Scott: My research approach is to integrate structured knowledge from >>> literature or other curated sources, and combine with observational >>> data to facilitate more reliable inference. General idea is that >>> contextual info can help interpret and identify confounders. >>> Confounders are common causes of the predictor and outcome. What I >>> did with CORD-19: took pubmed IDs, and found what machine reading >>> performed and created KG. Machine reading can run for months. Jim's >>> work on citation analysis is cool. Using semrep, developed by NLM, >>> over titles and abstracts in pubmed. Using Pubmed central IDs from >>> metadata table, in beginning of March, 31k papers, with 28k in pubmed >>> central. Seemed like a good place to start building a KG quickly, to >>> see the big picture quickly. Pulled 106k semantic predications in >>> the 21k docs, pulled into cytoscape and computed network centrality, >>> and from that ranked. Fitered by biomedicl entities, diseases, >>> syndromes, amino acids, peptides or pharm substances. Ranked themm >>> by centrality to understnad their importance. Very prelim analysis. >>> Interested to see how I might expand on this and learn what others >>> are doing. >>> >>> Guoqian: Can cytoscape support RDF graphs? David: Yes. Jim: Yes, >>> and you can form SPARQL queries to extract specific interactions. >>> Not 1:1 mapping of RDF graph to bio network. >>> >>> Todor: There are different plugins, one is SPARQL endpoint. Others >>> for other visualizations. Keep expectations low. >>> >>> Jim: It also includes a knowledge exploration interface, built on >>> cytoscape.js, a re-implementation of cytoscape. The implementation >>> I'm using has some interface element. >>> >>> Lucy: How does Coronavirus ont relate? >>> >>> Guoqian: Using this ont to annotate the papers. >>> >>> Lucy: Where did that ont come from? >>> >>> Jim: Built using OBO foundries? Guoqian: Yes. >>> >>> Jim: We use OBO ont. Oliver has a lot of tools for subsetting and >>> extracting for app ontologies. >>> >>> Guoqian: Also collaborating with Cochrane PICO ontology, devloping >>> COVID-19 PICO ont, specific subtypes of the high level types, eg, >>> subtypes of population with particular co-morbilitidies. The ont is >>> also avail on github. >>> >>> Guoqian: How to collaborate? Need a registry for KG from this >>> community? >>> >>> Lucy: Working on semantic annotation of entity and rel. Lots of >>> people are doing bottom-up annotation, without formal vocab, then >>> linking to UMLS. But haven't seen COVID-19 ont. >>> >>> Guoqian: Also should look at use cases that different groups have. >>> Community said they want open vocab instead of SNOMED-CT, such as UMLS. >>> >>> Lucy: Also working with a group at AWS, KB of concepts, link to >>> ICD-10 and RXNorm, also lots of requests for protein and interactions. >>> >>> Guoqian: Also procedure datasets. >>> >>> Lucy: What use cases are these projects addressing? >>> >>> Guoqian: For EBMonFHIR, they are focused on review of evidence, and >>> clinical concepts. Other team looking at using OBO ont to analyse DB >>> to mine underlying mechanisms. Ideally we should have linkage across >>> vocabularies. Eg UMLS can link many ont. But for OBO it might be a >>> challenge. >>> >>> Jim: From microbio perspectvie, most useful from this group would be >>> having cross mapping from clinical/FHIR/SNOMED-ish world and OBO bio >>> world, with translation between the two. E.g. I use uniprot IDs. Is >>> that a problem? What about drug IDs? IDs are the hardest part -- >>> agree on some, and mappings for others. >>> >>> Guoqian: If we can provide a list of ont each team prefers, we can >>> discuss. >>> >>> Lucy: Would be great to be able to share annotations. Centralized >>> vocab? Central KB? Use cases are key. >>> >>> Scott: Mapping problems with COVID-19 are same as other mapping >>> problems. Should have a central place to share projects. Should >>> keep use cases in mind. >>> >>> Sebastian: Please give us feedback on the dataset! >>> >>> Todor: Focus on specific questions that you want to answer, then map >>> using common IDs to address them. >>> >>> Daniel: What formats? Right now we're using FHIR. Use others? >>> >>> Jim: identifier.org might be useful for mapping. >>> >>> David: Useful to have each group present use cases and vocab. >>> >>> We'll meet weekly, same time, 1 hour. Each group will present their >>> work in more detail, with focus on: >>> what use cases they are addressing; and >>> what vocabularies / ontologies they're using. >>> >>> Each group will present for 20 min presents, 10 min questions. >>> >>> ADJOURNED
Received on Monday, 27 April 2020 16:05:14 UTC