Re: CORD-19 semantic annotations - 11am Tuesday (Boston time) - CANCELED TODAY - Rescheduled for next week

Apologies, I thought I had sent out the announcement for today's call, 
but I just discovered that it never left my Drafts folder.  :(

We are rescheduling for next week instead, when Franck Michel will 
present his work on CORD-19 Named Entities Knowledge Graph (CORD19-NEKG).

David Booth

On 4/28/20 12:09 PM, David Booth wrote:
> Notes from today's call:
> 
> MEETING NOTES 28-Apr-2020
> Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin, Franck 
> Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian Kohlmeier, 
> Tomáš Kliegr
> 
> Introductions
> David Booth: 10 years applying semantic web tech to healthcare and life 
> sciences, working on Mayo Clinic / Johns-Hopkins University collaboration.
> 
> Louis Rumane: United Health Group, Doing COVID research, looking at 
> making a KG
> 
> Tom Conlin: Working with Melissa Haendel (Monarch Initiative),
> 
> Franck: INRIA
> 
> Gollam: Prague, Univ
> 
> Jim: Research sci RPI, working on KG w bio
> 
> Lucy: Allen institute, research scientist.
> 
> Tomas: Assoc Prof, Prague, KG.
> 
> Sebastian: Sr Mgr on CORD-19.
> 
> Victor: Semantic Web company researcher
> 
> Victor's Presentation
> Slides here: 
> https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing 
> 
> 
> victor: Semantic Web Company, 40+ FTEs.  Makes PoolParty. Works w 
> companies in many counties.  Taxonomy helps extract entities from text. 
> image search, data mgmt.
> 
> victor: Developing text and data mining tools for biomed, and CORD-19. 
> We don't only annotate text.  What's useful about annotating text w 
> entities is to use the knowledge, simplest is encoded in SKOS, such as 
> broader/narrower.  But to do this we need to annotate the text into 
> URIs, then import relationships into the graph.  Trying to link existing 
> annotations w other knowledge sources.  Ont is simplified version of 
> NIFT: documents have sections, sections have annotations that are SKOS 
> concepts.
> 
> victor: So far, we've set up a pipeline to take a document and it finds 
> annotations with offsets.  So far imported ChEBI, GO, MeSH, HPO, but 
> using them as controlled vocab.  Many are very specific, such as 
> "COVID-19" -- not really NLP, because there are not inflections, 
> plurals, etc.  Output is a bunch of triples in the simple SKOS ont 
> previously mentioned. Put them into GraphDB, along with the vocabs.
> 
> victor: Also looked at SciBite annotations.  They've done an excellent 
> job annotating.  They also have their own controlled vocab that is very 
> good.  JSON files have annotations. Put them into triples.  Combining 
> them w bio DBs gives a graph DB.
> 
> (victor shows relationships in GraphDB viewer)
> 
> victor: you can navigate the hierarchy of concepts and link them to the 
> paragraphs in CORD-19 DB.
> 
> (victor shows SPARQL queries)
> 
> victor: This allows us to pull up the titles and paragraphs of articles 
> that both mention a kind of neoplasm and a kind of coronavirus.
> 
> victor: Want to take other DBs and put them into GraphDB also.  Monarch 
> Initiative is putting together KG, and also puts in SciBite.
> 
> victor: Missing from both our effort and Monarch: searchability.  I 
> showed SPARQL queries using broader/narrower.  Also need to be more 
> efficient for humans, working also on faceted search.  Monarch 
> Initiative is very good for machine readable stuff.  Another thing 
> missing: relation extraction, from the text.  How does human determine 
> that some text is saying that a protein interacts with another.  JPL 
> (Lewis Magidney?sp?) is using a Stanford NLP for relation extraction.
> https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph
> It isn't perfect, but it indicates a relationship.  Both entities are in 
> GO.  This adds new edges between entities.  Lots of interest in this 
> topic now.
> 
> Franck: We're doing pretty close to this in INRIA, looking at named 
> entities, wikidata entities, queries that gather all articles on cancer 
> and any coronavirus.  Another thing we're doing: in addition to 
> detecting named entities, we're running other tools to identify 
> arguments, claims, evidence in articles and draw netowrk of claims and 
> evidence to see what supports the claims.  Hope to publish this network 
> soon as RDF graph.
> 
> victor: PubAnnotation shown last week, showed epistemic analysis.
> 
> Franck: Argument, clinical trial analysys.  Query pubmed and platform 
> analyzes those articles.  Want to apply them to CORD-19.
> 
> Vincent: Is RDF available? victor: Will take a couple more weeks. 
> Vincent: Size? victor: 20GB RDF.
> 
> David: Overlap between efforts, helpful to learn about each other's work.
> 
> victor: After looking at Monarch initative, it isn't new, names i 
> recognized from Human Phenotype initative.  Most of that summarizes work 
> that others have done.  FHIR DB also have overlaps with SciBite.
> 
> david: SPARQL query was valuable, but biologists need simple UI.
> 
> jim: Working on faceted browser for various things, that can be reused. 
> Based on SPARQL fragments, property path gives certain values, here's 
> how to render it.  Potentially useful here.  Also integrated WHYIS Vega 
> (JS framework for charts and visualization), can plug a SPARQL query in 
> and get a chart.  People can share how thtey're exploring the graph.
> https://github.com/tetherless-world/whyis
> Faceted search is a view in WHYIS, but a lot of the capabilities are 
> designed to use nanopub.
> 
> Email list for these calls: 
> https://lists.w3.org/Archives/Public/public-semweb-lifesci/
> 
> Franck to present next week.
> 
> ADJOURNED
> 
> 
> On 4/27/20 4:00 PM, David Booth wrote:
>> We will use this zoom:
>>
>> Zoom Link: 
>> https://us02web.zoom.us/j/89011102533?pwd=SU9CdDYxUlRtUkNBdjFUN0x4MTRxUT09 
>>
>> password:  82AY02Rt66
>>
>> Thanks,
>> David Booth
>>
>> On 4/27/20 12:04 PM, David Booth wrote:
>>> Tomorrow (Tuesday) 11am Boston time Victor Mireles will present his 
>>> work on RDFizing several annotations on the Cord19 dataset that are 
>>> around in different vocabularies. Current vocabularies: gene 
>>> ontology, ChEBI, human phenotype ontology, MeSH disease.
>>>
>>> Details for joining the call will be posted in a follow-up message.
>>>
>>> Thanks,
>>> David Booth
>>>
>>> On 4/21/20 1:32 PM, David Booth wrote:
>>>> [Apologies for reaching the google hangout participant limit today, 
>>>> and thank you to Victor Mireles-Chavez for allowing us to switch 
>>>> over to his zoom instead!  I will find a better solution for next 
>>>> week.]
>>>>
>>>> Below are meeting notes from today's call.  If you would like to 
>>>> present your work on CORD-19 semantic annotations, please email me 
>>>> so that I can put you on the schedule.  You do not need to have 
>>>> results yet.  Even if you are just starting out, it is helpful to 
>>>> learn what others are doing.
>>>>
>>>>                             ----------------------------
>>>>
>>>> MEETING NOTES 21-Apr-2020
>>>>
>>>> Present: David Booth, Jin-Dong Kim, Víctor Mireles, Oliver 
>>>> Giles,Harry Hochheiser, Franck Michel, James Malone, Kyle Lo, 
>>>> Sebastian Kohlmeier, Guoqian Jiang, Gaurav Vaidya, Gollam Rabby, 
>>>> Oliver, Tomas Kliegr
>>>>
>>>> Introductions
>>>>
>>>> David Booth: Many years in semantic web technology, applying it to 
>>>> healthcare and life sciences for the past 10years.  Involved in 
>>>> standardizing the RDF representation of HL7 FHIR: 
>>>> https://www.hl7.org/fhir/rdf.html
>>>>
>>>> Gaurav: U of NC (https://renci.org/staff/gaurav-vaidya/), sem web 
>>>> tech, using CORD-19, trying to annotate ont terms as part of Robokop 
>>>> (https://robokop.renci.org/).
>>>>
>>>> Harry: U of Pittsburgh, involved w W3C, drug-drug interaction, 
>>>> cancer information models, not actively using CORD-19.
>>>>
>>>> James Malone: CTO SciBite in UK, provide sem enrichment tooling to 
>>>> pharma, KG building.  Background, applying ont to public data, 
>>>> machine learning, building ontologies.
>>>>
>>>> Kyle: Researcher at Allen Institute, NLP, working on CORD-19.
>>>>
>>>> Oliver: Machine learning at SciBite w James, NLP, machine learning.
>>>>
>>>> Sebastian: Prog mgr at Allen Institute, CORD-19.
>>>>
>>>> Tomas: Working on rule learning, trying to apply it to CORD-19.
>>>>
>>>> Victor Mireles: Researcher at sem web co in Austria, looking at 
>>>> annotations that others have been doing on CORD-19, trying to make 
>>>> them match, and our own annotations.
>>>>
>>>> Presentation by Jin-Dong Kim
>>>>
>>>> Slides: 
>>>> https://docs.google.com/presentation/d/1ynoe1Xxc_-rTiebbvvuPBQMaktK-DX87McuDVaLbI1g/edit#slide=id.g726dbf02a0_0_0 
>>>>
>>>>
>>>> Jin-Dong: Tokyo, database center for life science, Japan gov funded, 
>>>> bioinformatics, NLP, text mining, esp biomedical literature.
>>>>
>>>> (Jin-Dong presents his slides)
>>>>
>>>> Jin-Dong: Using multiple datasets.  Multiple groups producing 
>>>> annotations, isolation.  PubAnnotation is a 10-year-old project to 
>>>> integrate annotations to literature.  Collecting annotations for 
>>>> COVID-19 literature to integrate and release them for other use. 
>>>> PubAnnotation is an open repo of biomed text annotations.  Anyone 
>>>> can submit to it.  All annotations are aligned to the canonical texts.
>>>>
>>>> Jin-Dong: PubAnnotation also provides RESTful web services. Many 
>>>> annotators compatible with PubAnnotation.  Also collecting manual 
>>>> annotations using Testae.
>>>>
>>>> Q: Are the annotations from a controlled vocab or ont?  Jin-Dong: 
>>>> Both free text or from ont.
>>>>
>>>> Jin-Dong: Every text span has a URL.  You can see what projects 
>>>> include a doc.  And you can choose a span of text and see what 
>>>> projects used that span.
>>>>
>>>> Q: What is a project?  Jin-Dong: We collect any kind of annotations. 
>>>> Project identifies the source of people who have contributed 
>>>> annotations.
>>>>
>>>> Jin-Dong: Annotations can be accessed via a span URL.  Also 
>>>> converting annotationsn into RDF.  Still experimenting.  Also have a 
>>>> search interface.  SPARQL queries.
>>>> https://covid19.pubannotation.org/
>>>>
>>>> Jin-Dong: Trying to add annotations for temporation notations.
>>>>
>>>> Jin-Dong: Literature includes CORD-19 and LitCovid, from NCBI. 
>>>> Uploaded all the test to PubAnnotation 
>>>> (http://pubannotation.org/collections/LitCovid)  Anyone can 
>>>> contribute. To contribute, you can download, annotation, then create 
>>>> a new project and add it to the LitCovid collection and it will 
>>>> appear. Open platform.  Same setup for CORD-19.  Received 6 
>>>> contributions so far. Need to analyze them.  Planning to call for 
>>>> wider contributions soon, maybe next week.  Plan to continuously 
>>>> update.
>>>>
>>>> Guoqian: Any specific research questions using these annotations? 
>>>> Particular use cases?  Jin-Dong: Need to find out. Clinicians began 
>>>> with manual annotations.  Will figure out missing parts and try to 
>>>> fill the gaps.  Many annotations are concept annotations using ont 
>>>> -- many similar.  But we think there are still important missing 
>>>> annotations, such as temporal expressions.  Looking to add those. 
>>>> Also quantitative traits annotations are missing.  Looking for those 
>>>> too.
>>>>
>>>> Q: How might these be used?
>>>>
>>>> Franck: I'm in Inria/CNRS/Univ Côte d'Azur, contacts with Inserm 
>>>> (French NIH) point to the need to search literature with questions 
>>>> like: "What are the papers that link Coronavirus with other diseases 
>>>> like diabetes or cancer?"
>>>>
>>>> James: Released COVID-specific annotations. Pharma using them: 
>>>> looking for co-risk factors, or drugs interacting.  Comes down to: 
>>>> want to narrow down to a set of papers to read.  Anything that gets 
>>>> them to the paper.  Want to read the o
>>>>
>>>> Franck: Summarizing the main claim of the paper helps also, to 
>>>> narrow down the search.
>>>>
>>>> Victor: Drug-drug interactions.  Many other KGs, to link to 
>>>> drug-drug or protein-protein interaction databases we need URIs, so 
>>>> pubAnnotations can query and get URIs from it, so I can see what 
>>>> drugs are mentioned in this span.  Is this supported?
>>>>
>>>> Jin-Dong: Group in China is working on annotations for drug 
>>>> repurposing.   I think they're using drug ont.
>>>>
>>>> Franck: How can we consume the annotations that have been 
>>>> contributed? Jin-Dong: Download in JSON or CSV, or access as RDF.
>>>>
>>>> Tomas: We detect entities, then try to do semantic extension.  Would 
>>>> there be a way to use this for semantic extension of entities, or 
>>>> get a list of highly specific concepts that appear in the article. 
>>>> Jin-Dong: Yes, because they're in RDF, could do that.  Tomas: How to 
>>>> match doc in your DB with doc in other DB?  Jin-Dong: Every doc is 
>>>> identified by a pair: DB identifier, and ID within that DB.
>>>>
>>>> Tomas: How many annotations average per document?  Jin-Dong: 
>>>> Conversion is not entirely done.  RDF statements only partially 
>>>> done. Jin-Dong: in CORD-PICO, for 26k docs, 69k annotations for PICO.
>>>>
>>>> ADJOURNED
>>>>
>>>> -----------------------------------------------------------------------
>>>>
>>>> On 4/21/20 10:47 AM, David Booth wrote:
>>>>> Last minute schedule change for today's call: Instead of Scott 
>>>>> Malec, Jin-Dong Kim will present his work on "An open collaboration 
>>>>> for richly annotating Covid-19 Literature".  Slides are here:
>>>>> https://docs.google.com/presentation/d/1ynoe1Xxc_-rTiebbvvuPBQMaktK-DX87McuDVaLbI1g/edit#slide=id.g726dbf02a0_0_0 
>>>>>
>>>>>
>>>>> David Booth
>>>>>
>>>>> On 4/20/20 11:56 AM, David Booth wrote:
>>>>>> Tomorrow (Tuesday) 11am Boston time Scott Malec will discuss his 
>>>>>> work on computable knowledge extraction using the CORD-19 dataset 
>>>>>> that was released by the Allen Institute.
>>>>>>
>>>>>> We will use this google hangout:
>>>>>> http://tinyurl.com/fhirrdf
>>>>>>
>>>>>> More on Scott's work:
>>>>>> https://github.com/fhircat/CORD-19-on-FHIR/wiki/CORD-19-Semantic-Annotation-Projects#project-name-cord-semantictriples 
>>>>>>
>>>>>>
>>>>>> We still have time for one other presentation tomorrow about 
>>>>>> CORD-19 semantic annotation.  If anyone else is ready (with 
>>>>>> slides) to present for 20 minutes, please let me know.
>>>>>>
>>>>>> Thanks,
>>>>>> David Booth
>>>>>>
>>>>>> -----------------------------------------------
>>>>>>
>>>>>> MEETING NOTES 7-Apr-2020
>>>>>> Present: David Booth <david@dbooth.org>, Sebastian Kohlmeier 
>>>>>> <sebastiank@allenai.org>, Lucy Lu Wang <lucyw@allenai.org>, Kyle 
>>>>>> Lo <kylel@allenai.org>, Jim McCusker <mccusker@gmail.com>, Scott 
>>>>>> Malec <sam413@pitt.edu>, Guoqian Jiang <jiang.guoqian@mayo.edu>, 
>>>>>> Todor Primov <todor.primov@ontotext.com>
>>>>>>
>>>>>> Sebastian: Allen Institute, Semantic Scholar, Non-profit AI 
>>>>>> institute, w Lucy and Kyle.  Engaged in COVID-19 because as 
>>>>>> non-profit could develop a corpus that we can make available. 
>>>>>> Created CORD-19 dataset.  Goal: Standardized format that's easy 
>>>>>> for machines to read, to enable quick analysys of the literature. 
>>>>>> Working to extend it. Weekly updates, but want to get to daily 
>>>>>> updates.  Want to also get to to entity and relation extraction.
>>>>>>
>>>>>> Guoqian: Identifiers used?  SHA numbers for full text, but also 
>>>>>> IDs linked to DOIs and Pubmed IDs.  Should discuss best way to 
>>>>>> have unique ID for publication.
>>>>>>
>>>>>> Kyle: Added unique IDs: cord_UID.  SHA is a hash of PDF, and 
>>>>>> sometimes there are multiple PDFs for a single paper.
>>>>>>
>>>>>> Jim: DOIs?
>>>>>>
>>>>>> Lucy: Some papers do not have a DOI.
>>>>>>
>>>>>> Jim: Building a KG using generalized tools from another projects, 
>>>>>> used in many domains.  Looking to do drug repurposing using 
>>>>>> CORD-19. Using an extract of CORD-19.  Does deep extraction of 
>>>>>> named entities and relationships.  Use PROV ont and 
>>>>>> nanopublications, for rich modeling and provenance for 
>>>>>> probabilistic KG.  Arcs in picture are based on confidence level. 
>>>>>> Allows high precision on drugs that have been tested on melanoma 
>>>>>> before.  Re-applying this to COVID-19.  We focus on open 
>>>>>> ontologies, and not using FHIR.  Unpublished yet. Page-rank based 
>>>>>> analysis of pubmed citation graph, to compute community trust in a 
>>>>>> paper.
>>>>>>
>>>>>> Guoqian: What ont?
>>>>>>
>>>>>> Jim: Drugbank mostly.  Lots of targets.
>>>>>>
>>>>>> Kyle: Relation-entity set.  Closed set?
>>>>>>
>>>>>> Jim: We have drug graph, protein-protein interaction, and drugbank 
>>>>>> has drug-protein interaction.  Molecular interaction.  CTD 
>>>>>> Comparative Toxinomic Database, Heng Ji Lab database started with it.
>>>>>>
>>>>>> Kyle: Trying to add more KB entities?
>>>>>>
>>>>>> Jim: Want to expand the interaction set.  Also entities.  We have 
>>>>>> all human proteins and drugbank drugs.  If you have a drug with an 
>>>>>> effect on a target similar protein in COVID-19, will there be 
>>>>>> hits, directly or indirectly?  To do that, we want to score it 
>>>>>> also based on confidence in the research.
>>>>>>
>>>>>> Scott: My research approach is to integrate structured knowledge 
>>>>>> from literature or other curated sources, and combine with 
>>>>>> observational data to facilitate more reliable inference.  General 
>>>>>> idea is that contextual info can help interpret and identify 
>>>>>> confounders. Confounders are common causes of the predictor and 
>>>>>> outcome.  What I did with CORD-19: took pubmed IDs, and found what 
>>>>>> machine reading performed and created KG.  Machine reading can run 
>>>>>> for months.  Jim's work on citation analysis is cool.  Using 
>>>>>> semrep, developed by NLM, over titles and abstracts in pubmed. 
>>>>>> Using Pubmed central IDs from metadata table, in beginning of 
>>>>>> March, 31k papers, with 28k in pubmed central.  Seemed like a good 
>>>>>> place to start building a KG quickly, to see the big picture 
>>>>>> quickly.  Pulled 106k semantic predications in the 21k docs, 
>>>>>> pulled into cytoscape and computed network centrality, and from 
>>>>>> that ranked. Fitered by biomedicl entities, diseases, syndromes, 
>>>>>> amino acids, peptides or pharm substances.  Ranked themm by 
>>>>>> centrality to understnad their importance.  Very prelim analysis. 
>>>>>> Interested to see how I might expand on this and learn what others 
>>>>>> are doing.
>>>>>>
>>>>>> Guoqian: Can cytoscape support RDF graphs?  David: Yes.  Jim: Yes, 
>>>>>> and you can form SPARQL queries to extract specific interactions. 
>>>>>> Not 1:1 mapping of RDF graph to bio network.
>>>>>>
>>>>>> Todor: There are different plugins, one is SPARQL endpoint.  
>>>>>> Others for other visualizations.  Keep expectations low.
>>>>>>
>>>>>> Jim: It also includes a knowledge exploration interface, built on 
>>>>>> cytoscape.js, a re-implementation of cytoscape.  The 
>>>>>> implementation I'm using has some interface element.
>>>>>>
>>>>>> Lucy: How does Coronavirus ont relate?
>>>>>>
>>>>>> Guoqian: Using this ont to annotate the papers.
>>>>>>
>>>>>> Lucy: Where did that ont come from?
>>>>>>
>>>>>> Jim: Built using OBO foundries?  Guoqian: Yes.
>>>>>>
>>>>>> Jim: We use OBO ont.  Oliver has a lot of tools for subsetting and 
>>>>>> extracting for app ontologies.
>>>>>>
>>>>>> Guoqian: Also collaborating with Cochrane PICO ontology, devloping 
>>>>>> COVID-19 PICO ont, specific subtypes of the high level types, eg, 
>>>>>> subtypes of population with particular co-morbilitidies.  The ont 
>>>>>> is also avail on github.
>>>>>>
>>>>>> Guoqian: How to collaborate?  Need a registry for KG from this 
>>>>>> community?
>>>>>>
>>>>>> Lucy: Working on semantic annotation of entity and rel.  Lots of 
>>>>>> people are doing bottom-up annotation, without formal vocab, then 
>>>>>> linking to UMLS.  But haven't seen COVID-19 ont.
>>>>>>
>>>>>> Guoqian: Also should look at use cases that different groups have. 
>>>>>> Community said they want open vocab instead of SNOMED-CT, such as 
>>>>>> UMLS.
>>>>>>
>>>>>> Lucy: Also working with a group at AWS, KB of concepts, link to 
>>>>>> ICD-10 and RXNorm, also lots of requests for protein and 
>>>>>> interactions.
>>>>>>
>>>>>> Guoqian: Also procedure datasets.
>>>>>>
>>>>>> Lucy: What use cases are these projects addressing?
>>>>>>
>>>>>> Guoqian: For EBMonFHIR, they are focused on review of evidence, 
>>>>>> and clinical concepts.  Other team looking at using OBO ont to 
>>>>>> analyse DB to mine underlying mechanisms.  Ideally we should have 
>>>>>> linkage across vocabularies.  Eg UMLS can link many ont.  But for 
>>>>>> OBO it might be  a challenge.
>>>>>>
>>>>>> Jim: From microbio perspectvie, most useful from this group would 
>>>>>> be having cross mapping from clinical/FHIR/SNOMED-ish world and 
>>>>>> OBO bio world, with translation between the two.  E.g. I use 
>>>>>> uniprot IDs. Is that a problem?  What about drug IDs?  IDs are the 
>>>>>> hardest part -- agree on some, and mappings for others.
>>>>>>
>>>>>> Guoqian: If we can provide a list of ont each team prefers, we can 
>>>>>> discuss.
>>>>>>
>>>>>> Lucy: Would be great to be able to share annotations.  Centralized 
>>>>>> vocab?  Central KB?  Use cases are key.
>>>>>>
>>>>>> Scott: Mapping problems with COVID-19 are same as other mapping 
>>>>>> problems.  Should have a central place to share projects.  Should 
>>>>>> keep use cases in mind.
>>>>>>
>>>>>> Sebastian: Please give us feedback on the dataset!
>>>>>>
>>>>>> Todor: Focus on specific questions that you want to answer, then 
>>>>>> map using common IDs to address them.
>>>>>>
>>>>>> Daniel: What formats?  Right now we're using FHIR.  Use others?
>>>>>>
>>>>>> Jim: identifier.org might be useful for mapping.
>>>>>>
>>>>>> David: Useful to have each group present use cases and vocab.
>>>>>>
>>>>>> We'll meet weekly, same time, 1 hour.  Each group will present 
>>>>>> their work in more detail, with focus on:
>>>>>> what use cases they are addressing; and
>>>>>> what vocabularies / ontologies they're using.
>>>>>>
>>>>>> Each group will present for 20 min presents, 10 min questions.
>>>>>>
>>>>>> ADJOURNED

Received on Tuesday, 5 May 2020 15:07:34 UTC