Re: CORD-19 semantic annotations - 11am Tuesday (Boston time) - Jin-Dong Kim (Schedule change) from Deborah L. McGuinness on 2020-04-21 (public-semweb-lifesci@w3.org from April 2020)

From: Deborah L. McGuinness <dlm@cs.rpi.edu>
Date: Tue, 21 Apr 2020 11:19:43 -0400
To: <public-semweb-lifesci@w3.org>
Message-ID: <8317a381-22e0-5e6f-c2ab-f682370a47f0@cs.rpi.edu>
another option given that there are slides is just to provide an audio 
call in number and since you shared the ppt link, people on audio could 
follow along

On 4/21/2020 11:12 AM, Deborah L. McGuinness wrote:
> i also got the video is full
>
> On 4/21/2020 11:10 AM, Vinh Nguyen wrote:
>> Hi David,
>>
>> I would like to join the meeting but I am unable to join the Hangout 
>> call because the video call is full with 10.
>> Can we use some other meeting platform with more connections?
>>
>> Thanks,
>> Vinh
>>
>>> On Apr 21, 2020, at 10:47 AM, David Booth <david@dbooth.org> wrote:
>>>
>>> Last minute schedule change for today's call: Instead of Scott 
>>> Malec, Jin-Dong Kim will present his work on "An open collaboration 
>>> for richly annotating Covid-19 Literature". Slides are here:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_presentation_d_1ynoe1Xxc-5F-2DrTiebbvvuPBQMaktK-2DDX87McuDVaLbI1g_edit-23slide-3Did.g726dbf02a0-5F0-5F0&d=DwIDaQ&c=3buyMx9JlH1z22L_G5pM28wz_Ru6WjhVHwo-vpeS0Gk&r=ao0HdW4_BSUBmFtYkSUY5HNHmhEUEMBLFy-u4FMLkt8&m=rRtrUOuREZcGoUF677l46sKSQCe3qGQJFjlQTHpEI7k&s=TLzgoWQAHR-uMKRPHbRPpg8cDYS3XeEcAgqsAQoJYjg&e= 
>>>
>>> David Booth
>>>
>>> On 4/20/20 11:56 AM, David Booth wrote:
>>>> Tomorrow (Tuesday) 11am Boston time Scott Malec will discuss his 
>>>> work on computable knowledge extraction using the CORD-19 dataset 
>>>> that was released by the Allen Institute.
>>>> We will use this google hangout:
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__tinyurl.com_fhirrdf&d=DwIDaQ&c=3buyMx9JlH1z22L_G5pM28wz_Ru6WjhVHwo-vpeS0Gk&r=ao0HdW4_BSUBmFtYkSUY5HNHmhEUEMBLFy-u4FMLkt8&m=rRtrUOuREZcGoUF677l46sKSQCe3qGQJFjlQTHpEI7k&s=QkufOhCI2BnIKN7ZxwS0x6FmTBNAT_HXcQGGcVq-atE&e= 
>>>> More on Scott's work:
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_fhircat_CORD-2D19-2Don-2DFHIR_wiki_CORD-2D19-2DSemantic-2DAnnotation-2DProjects-23project-2Dname-2Dcord-2Dsemantictriples&d=DwIDaQ&c=3buyMx9JlH1z22L_G5pM28wz_Ru6WjhVHwo-vpeS0Gk&r=ao0HdW4_BSUBmFtYkSUY5HNHmhEUEMBLFy-u4FMLkt8&m=rRtrUOuREZcGoUF677l46sKSQCe3qGQJFjlQTHpEI7k&s=EXrxlQi3KgJkLSQL8C1tfkjnKNPy46cP4BgRxBPM-RU&e= 
>>>> We still have time for one other presentation tomorrow about 
>>>> CORD-19 semantic annotation.  If anyone else is ready (with slides) 
>>>> to present for 20 minutes, please let me know.
>>>> Thanks,
>>>> David Booth
>>>> -----------------------------------------------
>>>> MEETING NOTES 7-Apr-2020
>>>> Present: David Booth <david@dbooth.org>, Sebastian Kohlmeier 
>>>> <sebastiank@allenai.org>, Lucy Lu Wang <lucyw@allenai.org>, Kyle Lo 
>>>> <kylel@allenai.org>, Jim McCusker <mccusker@gmail.com>, Scott Malec 
>>>> <sam413@pitt.edu>, Guoqian Jiang <jiang.guoqian@mayo.edu>, Todor 
>>>> Primov <todor.primov@ontotext.com>
>>>> Sebastian: Allen Institute, Semantic Scholar, Non-profit AI 
>>>> institute, w Lucy and Kyle.  Engaged in COVID-19 because as 
>>>> non-profit could develop a corpus that we can make available.  
>>>> Created CORD-19 dataset.  Goal: Standardized format that's easy for 
>>>> machines to read, to enable quick analysys of the literature.  
>>>> Working to extend it.  Weekly updates, but want to get to daily 
>>>> updates.  Want to also get to to entity and relation extraction.
>>>> Guoqian: Identifiers used?  SHA numbers for full text, but also IDs 
>>>> linked to DOIs and Pubmed IDs.  Should discuss best way to have 
>>>> unique ID for publication.
>>>> Kyle: Added unique IDs: cord_UID.  SHA is a hash of PDF, and 
>>>> sometimes there are multiple PDFs for a single paper.
>>>> Jim: DOIs?
>>>> Lucy: Some papers do not have a DOI.
>>>> Jim: Building a KG using generalized tools from another projects, 
>>>> used in many domains.  Looking to do drug repurposing using 
>>>> CORD-19.  Using an extract of CORD-19. Does deep extraction of 
>>>> named entities and relationships. Use PROV ont and 
>>>> nanopublications, for rich modeling and provenance for 
>>>> probabilistic KG.  Arcs in picture are based on confidence level.  
>>>> Allows high precision on drugs that have been tested on melanoma 
>>>> before.  Re-applying this to COVID-19.  We focus on open 
>>>> ontologies, and not using FHIR. Unpublished yet.  Page-rank based 
>>>> analysis of pubmed citation graph, to compute community trust in a 
>>>> paper.
>>>> Guoqian: What ont?
>>>> Jim: Drugbank mostly.  Lots of targets.
>>>> Kyle: Relation-entity set.  Closed set?
>>>> Jim: We have drug graph, protein-protein interaction, and drugbank 
>>>> has drug-protein interaction.  Molecular interaction.  CTD 
>>>> Comparative Toxinomic Database, Heng Ji Lab database started with it.
>>>> Kyle: Trying to add more KB entities?
>>>> Jim: Want to expand the interaction set.  Also entities.  We have 
>>>> all human proteins and drugbank drugs.  If you have a drug with an 
>>>> effect on a target similar protein in COVID-19, will there be hits, 
>>>> directly or indirectly?  To do that, we want to score it also based 
>>>> on confidence in the research.
>>>> Scott: My research approach is to integrate structured knowledge 
>>>> from literature or other curated sources, and combine with 
>>>> observational data to facilitate more reliable inference.  General 
>>>> idea is that contextual info can help interpret and identify 
>>>> confounders.  Confounders are common causes of the predictor and 
>>>> outcome.  What I did with CORD-19: took pubmed IDs, and found what 
>>>> machine reading performed and created KG.  Machine reading can run 
>>>> for months.  Jim's work on citation analysis is cool.  Using 
>>>> semrep, developed by NLM, over titles and abstracts in pubmed.  
>>>> Using Pubmed central IDs from metadata table, in beginning of 
>>>> March, 31k papers, with 28k in pubmed central. Seemed like a good 
>>>> place to start building a KG quickly, to see the big picture 
>>>> quickly.  Pulled 106k semantic predications in the 21k docs, pulled 
>>>> into cytoscape and computed network centrality, and from that 
>>>> ranked. Fitered by biomedicl entities, diseases, syndromes, amino 
>>>> acids, peptides or pharm substances.  Ranked themm by centrality to 
>>>> understnad their importance.  Very prelim analysis. Interested to 
>>>> see how I might expand on this and learn what others are doing.
>>>> Guoqian: Can cytoscape support RDF graphs?  David: Yes. Jim: Yes, 
>>>> and you can form SPARQL queries to extract specific interactions.  
>>>> Not 1:1 mapping of RDF graph to bio network.
>>>> Todor: There are different plugins, one is SPARQL endpoint. Others 
>>>> for other visualizations.  Keep expectations low.
>>>> Jim: It also includes a knowledge exploration interface, built on 
>>>> cytoscape.js, a re-implementation of cytoscape. The implementation 
>>>> I'm using has some interface element.
>>>> Lucy: How does Coronavirus ont relate?
>>>> Guoqian: Using this ont to annotate the papers.
>>>> Lucy: Where did that ont come from?
>>>> Jim: Built using OBO foundries?  Guoqian: Yes.
>>>> Jim: We use OBO ont.  Oliver has a lot of tools for subsetting and 
>>>> extracting for app ontologies.
>>>> Guoqian: Also collaborating with Cochrane PICO ontology, devloping 
>>>> COVID-19 PICO ont, specific subtypes of the high level types, eg, 
>>>> subtypes of population with particular co-morbilitidies.  The ont 
>>>> is also avail on github.
>>>> Guoqian: How to collaborate?  Need a registry for KG from this 
>>>> community?
>>>> Lucy: Working on semantic annotation of entity and rel. Lots of 
>>>> people are doing bottom-up annotation, without formal vocab, then 
>>>> linking to UMLS.  But haven't seen COVID-19 ont.
>>>> Guoqian: Also should look at use cases that different groups have. 
>>>> Community said they want open vocab instead of SNOMED-CT, such as 
>>>> UMLS.
>>>> Lucy: Also working with a group at AWS, KB of concepts, link to 
>>>> ICD-10 and RXNorm, also lots of requests for protein and interactions.
>>>> Guoqian: Also procedure datasets.
>>>> Lucy: What use cases are these projects addressing?
>>>> Guoqian: For EBMonFHIR, they are focused on review of evidence, and 
>>>> clinical concepts.  Other team looking at using OBO ont to analyse 
>>>> DB to mine underlying mechanisms. Ideally we should have linkage 
>>>> across vocabularies.  Eg UMLS can link many ont.  But for OBO it 
>>>> might be  a challenge.
>>>> Jim: From microbio perspectvie, most useful from this group would 
>>>> be having cross mapping from clinical/FHIR/SNOMED-ish world and OBO 
>>>> bio world, with translation between the two. E.g. I use uniprot 
>>>> IDs.  Is that a problem?  What about drug IDs?  IDs are the hardest 
>>>> part -- agree on some, and mappings for others.
>>>> Guoqian: If we can provide a list of ont each team prefers, we can 
>>>> discuss.
>>>> Lucy: Would be great to be able to share annotations. Centralized 
>>>> vocab?  Central KB?  Use cases are key.
>>>> Scott: Mapping problems with COVID-19 are same as other mapping 
>>>> problems.  Should have a central place to share projects.  Should 
>>>> keep use cases in mind.
>>>> Sebastian: Please give us feedback on the dataset!
>>>> Todor: Focus on specific questions that you want to answer, then 
>>>> map using common IDs to address them.
>>>> Daniel: What formats?  Right now we're using FHIR.  Use others?
>>>> Jim: identifier.org might be useful for mapping.
>>>> David: Useful to have each group present use cases and vocab.
>>>> We'll meet weekly, same time, 1 hour.  Each group will present 
>>>> their work in more detail, with focus on:
>>>> what use cases they are addressing; and
>>>> what vocabularies / ontologies they're using.
>>>> Each group will present for 20 min presents, 10 min questions.
>>>> ADJOURNED
>>
-- 
Deborah L. McGuinness
Tetherless World Senior Constellation Chair
Professor Computer, Cognitive, and Web Sciences
Director Rensselaer Web Science Research Center
Rensselaer Polytechnic Institute
105 8th Street
Troy, NY 12180
(v) 518 276 4404  (f) 518 276 4464
dlm@cs.rpi.edu
Received on Tuesday, 21 April 2020 15:20:12 UTC