- From: David Booth <david@dbooth.org>
- Date: Mon, 18 May 2020 15:43:53 -0400
- To: w3c semweb HCLS <public-semweb-lifesci@w3.org>
Tomorrow (Tuesday) we will have a series of 5-minute overview
presentations by people doing semantic annotation of the CORD-19 dataset:
Daniel Stone,
Gaurav Vaidya,
Gollam Rabby,
Marcin Joachimiak,
Michael Liebman,
Tom Conlin,
David Booth.
Zoom Link:
https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09
If anyone else wishes to present their CORD-19 work, please let me know.
We will probably hold another, similar session next week or a
following week also, for people who were not able to present today.
The CORD-19 dataset is a dataset released by the Allen Institute
containing 63,000 journal article related to COVID-19.
Thanks,
David Booth
On 5/13/20 10:46 AM, David Booth wrote:
> Notes from yesterday's webinar by Franck Michel are below. Thanks to
> Victor Mireles-Chavez a recording of the call is available at the
> following URL. Franck's presentation starts at 17:10.
>
> https://tinyurl.com/y8kmfxhe
> Recording password: 7t?N&*9+
>
> --------------------------------------------------------------
> MEETING NOTES 12-May-2020
> Present: David Booth, Victor Mireles, Franck Michel, Albert Burger,
> Daniel Stone, Deborah McGuiness, Filip, Gaurav Vaidya, Gollam Rabby,
> Louis, Gollam Rabby, Louis Rumanes, Marcin Joachimiak, Michael Liebman,
> Subhashis Das, Nico, Tom Conlin, Chuming Chen
>
> Introductions
> David Booth: 10 years applying semantic web tech to healthcare and life
> sciences, working on Mayo Clinic / Johns-Hopkins University collaboration.
>
> Subhashis Das: PostDoctoral researcher at CeIC, DCU, Dublin.
> Specialization in domain ontology and healthcare data integration.
>
> Franck's presentation
> Slides:
> https://www.dropbox.com/s/nnyg1o45f9dvimk/20200512%20Covid-on-the-Web%20-%20CORD-19%20semantic%20annotations.pdf?dl=0
>
>
> Franck: Goal is to make it easier to find and make sense of COVID-19
> literature: both named entities, and argumentative graphs. Using
> DBpedia Spotlight, Entity-fishing, BioPortal Annotator.
>
> Franck: Releasing v1.1 shortly. 54M named entities, 564k URIs.
> 30M NEs, 155,651 URIs from Wikidata
> 21M NEs, 339,990 URIs from BioPortal
> 1.8M NEs, from DBpedia
> https://github.com/wimmics/cord19-nekg
> Full modelling details:
> https://github.com/Wimmics/cord19-nekg/blob/master/doc/01-data-modeling.md
> SPARQL endpoint: http://covid19.i3s.unice.fr/sparql
> Virtuoso faceted browsing: http://covid19.i3s.unice.fr:8890/fct/
> Franck: Web annotation ont and PROV-O used to annotate articles.
> Annotation points to article and position within the article where the
> entity was found.
>
> Franck: Able to query for cancer entity and its subclasses or instances.
>
> Franck: Also looking at co-mentions of named entities.
>
> Franck: Colleagues also working on ACTA: A Tool for Argumentative ...
> claims/evidence. This would allow arguments/claims/evidence to be
> displayed in a graph.
>
> David: What ont are you using for determining the subclass relations of
> cancer, for example?
> Franck: So far using wikidata hierarchy. One exception: viruses in
> wikidata are not modeled as classes, so we regenerated them as classes.
>
> Victor: Why can't DBpedia SPotlight process full text?
> Franck: We have 54M NEs, 700M triples. Not enough machine power to do
> full text.
>
> Victor: If I find offsets, how can I be sure that I am aligned in my own
> data?
> Franck: It refers specifically to the CORD-19 dataset.
>
> Marcin: How are you extracting info about viral proteins? There are
> poly proteins?
> Franck: We rely on the results of the tools we're using. If a protein
> is identified by those tools then we get them. If an article mentions a
> gene name, would it show up?
>
> Marcin: There are a few of these different entity extraction efforts.
> Should we try to merge them?
>
> David: That's exactly the point of these teleconferences -- to start
> learning about each other's work and figure out how best to coordinate.
>
> michael: We compared analysis of abstracts vs full body, and found
> significant difference, because abstract is more of an advertisement.
> Also, in dealing with the full body, we found it necessary to parse the
> article, separate section on methods, results, conclusions.
>
> Franck: My colleagues working on argumentative extraction, quality
> varies a lot from one category to another. They've noticed
> (anecdotally) that clinical trials have an abstract with a few clear
> statements about results, and relatively easy to extract, but not for
> other articles.
>
> Victor: Comment on avoiding duplication of effort, there is quite some
> effort in doing annotations. Some are better prepared than others.
> Takes time. By the time someone presents work, others have already
> spent time doing similar work.
>
> David: We began these calls with very brief presentations by each
> participant, but after that, switched to deeper presentations of each
> project.
>
> Deborah: When presenting, please say what of your work is ready for
> others to use.
>
> Tom: Also interested in timing, how long things took, what was good/bad.
>
> AGREED: Next week we will do 5-minute presentations of what we're doing
> or planning.
>
> Speakers next week: Daniel, Deborah, Gaurav, Gollam, Marcin, John Z,
> Michael, Tom, David.
>
> Subhashis: not next week, but later.
>
> ADJOURNED
>
>
> On 5/11/20 12:22 PM, David Booth wrote:
>> Tomorrow (Tuesday) Franck Michel will present his work on CORD-19
>> Named Entities Knowledge Graph (CORD19-NEKG).
>>
>> Zoom Link:
>> https://us02web.zoom.us/j/83815969391?pwd=Q0k4Nm9xc3V2K0djL0FYT2JMVTJmUT09
>>
>>
>> Thanks,
>> David Booth
>>
>> On 4/28/20 12:09 PM, David Booth wrote:
>>> Notes from today's call:
>>>
>>> MEETING NOTES 28-Apr-2020
>>> Present: David Booth, Victor Mireles, Louis Rumanes, Tom Conlin,
>>> Franck Michel, Gollam Rabby, Jim McCusker, Lucy Wong, Sebastian
>>> Kohlmeier, Tomáš Kliegr
>>>
>>> Introductions
>>> David Booth: 10 years applying semantic web tech to healthcare and
>>> life sciences, working on Mayo Clinic / Johns-Hopkins University
>>> collaboration.
>>>
>>> Louis Rumane: United Health Group, Doing COVID research, looking at
>>> making a KG
>>>
>>> Tom Conlin: Working with Melissa Haendel (Monarch Initiative),
>>>
>>> Franck: INRIA
>>>
>>> Gollam: Prague, Univ
>>>
>>> Jim: Research sci RPI, working on KG w bio
>>>
>>> Lucy: Allen institute, research scientist.
>>>
>>> Tomas: Assoc Prof, Prague, KG.
>>>
>>> Sebastian: Sr Mgr on CORD-19.
>>>
>>> Victor: Semantic Web company researcher
>>>
>>> Victor's Presentation
>>> Slides here:
>>> https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing
>>>
>>>
>>> victor: Semantic Web Company, 40+ FTEs. Makes PoolParty. Works w
>>> companies in many counties. Taxonomy helps extract entities from
>>> text. image search, data mgmt.
>>>
>>> victor: Developing text and data mining tools for biomed, and
>>> CORD-19. We don't only annotate text. What's useful about annotating
>>> text w entities is to use the knowledge, simplest is encoded in SKOS,
>>> such as broader/narrower. But to do this we need to annotate the
>>> text into URIs, then import relationships into the graph. Trying to
>>> link existing annotations w other knowledge sources. Ont is
>>> simplified version of NIFT: documents have sections, sections have
>>> annotations that are SKOS concepts.
>>>
>>> victor: So far, we've set up a pipeline to take a document and it
>>> finds annotations with offsets. So far imported ChEBI, GO, MeSH,
>>> HPO, but using them as controlled vocab. Many are very specific,
>>> such as "COVID-19" -- not really NLP, because there are not
>>> inflections, plurals, etc. Output is a bunch of triples in the
>>> simple SKOS ont previously mentioned. Put them into GraphDB, along
>>> with the vocabs.
>>>
>>> victor: Also looked at SciBite annotations. They've done an
>>> excellent job annotating. They also have their own controlled vocab
>>> that is very good. JSON files have annotations. Put them into
>>> triples. Combining them w bio DBs gives a graph DB.
>>>
>>> (victor shows relationships in GraphDB viewer)
>>>
>>> victor: you can navigate the hierarchy of concepts and link them to
>>> the paragraphs in CORD-19 DB.
>>>
>>> (victor shows SPARQL queries)
>>>
>>> victor: This allows us to pull up the titles and paragraphs of
>>> articles that both mention a kind of neoplasm and a kind of coronavirus.
>>>
>>> victor: Want to take other DBs and put them into GraphDB also.
>>> Monarch Initiative is putting together KG, and also puts in SciBite.
>>>
>>> victor: Missing from both our effort and Monarch: searchability. I
>>> showed SPARQL queries using broader/narrower. Also need to be more
>>> efficient for humans, working also on faceted search. Monarch
>>> Initiative is very good for machine readable stuff. Another thing
>>> missing: relation extraction, from the text. How does human
>>> determine that some text is saying that a protein interacts with
>>> another. JPL (Lewis Magidney?sp?) is using a Stanford NLP for
>>> relation extraction.
>>> https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph
>>> It isn't perfect, but it indicates a relationship. Both entities are
>>> in GO. This adds new edges between entities. Lots of interest in
>>> this topic now.
>>>
>>> Franck: We're doing pretty close to this in INRIA, looking at named
>>> entities, wikidata entities, queries that gather all articles on
>>> cancer and any coronavirus. Another thing we're doing: in addition
>>> to detecting named entities, we're running other tools to identify
>>> arguments, claims, evidence in articles and draw netowrk of claims
>>> and evidence to see what supports the claims. Hope to publish this
>>> network soon as RDF graph.
>>>
>>> victor: PubAnnotation shown last week, showed epistemic analysis.
>>>
>>> Franck: Argument, clinical trial analysys. Query pubmed and platform
>>> analyzes those articles. Want to apply them to CORD-19.
>>>
>>> Vincent: Is RDF available? victor: Will take a couple more weeks.
>>> Vincent: Size? victor: 20GB RDF.
>>>
>>> David: Overlap between efforts, helpful to learn about each other's
>>> work.
>>>
>>> victor: After looking at Monarch initative, it isn't new, names i
>>> recognized from Human Phenotype initative. Most of that summarizes
>>> work that others have done. FHIR DB also have overlaps with SciBite.
>>>
>>> david: SPARQL query was valuable, but biologists need simple UI.
>>>
>>> jim: Working on faceted browser for various things, that can be
>>> reused. Based on SPARQL fragments, property path gives certain
>>> values, here's how to render it. Potentially useful here. Also
>>> integrated WHYIS Vega (JS framework for charts and visualization),
>>> can plug a SPARQL query in and get a chart. People can share how
>>> thtey're exploring the graph.
>>> https://github.com/tetherless-world/whyis
>>> Faceted search is a view in WHYIS, but a lot of the capabilities are
>>> designed to use nanopub.
>>>
>>> Email list for these calls:
>>> https://lists.w3.org/Archives/Public/public-semweb-lifesci/
>>>
>>> Franck to present next week.
>>>
>>> ADJOURNED
Received on Monday, 18 May 2020 19:44:07 UTC