Re: FHIR RDF - 11am (Boston) Thur July 22 - UNC's work on FHIR RDF; Objectives for R5 from David Booth on 2021-07-23 (public-semweb-lifesci@w3.org from July 2021)

From: David Booth <david@dbooth.org>
Date: Thu, 22 Jul 2021 20:33:59 -0400
To: "its@lists.hl7.org" <its@lists.HL7.org>, w3c semweb HCLS <public-semweb-lifesci@w3.org>
Message-ID: <5f0dafbd-a304-9c79-4dbf-ee407141189f@dbooth.org>
Minutes from today's teleconference are here:
https://www.w3.org/2021/07/22-hcls-minutes.html
and also below in plain text.

David Booth

--------------------------------------------------

Attendees

    Present
           Brad Simons, Darrell Woelk, David Booth, Emily Pfaff,
           Gaurav Vaidya, Gerhard kober, Gopi Chandrasekharan,
           Guoqian Jiang, Sajjad Hussein, Samson Tu

    Chair
           David Booth

    Scribe
           dbooth

Contents

     1. [3]Introductions
     2. [4]UNC work with FHIR RDF -- Emily Pfaff
     3. [5]Desired R5 features
     4. [6]Summary of action items

Meeting minutes

   Introductions

    Samson: Bio ontologies

    Guoqian: Mayo Clinic, FHIRCat project.

   UNC work with FHIR RDF -- Emily Pfaff

    Emily's slides: 
https://lists.w3.org/Archives/Public/www-archive/2021Jul/att-0005/5-15-RDF_FHIR.pptx

    emily: Assistant Prof UNC Chapel Hill, clinical informatics
    background. Used FHIR RDF in two projects.

    emily: Joining FHIR data with external datasets.
    … Main interest in computable phenotyping -- idenitify cohorts
    of patients based on inclusion/exclusion criteria.
    Traditionally those criteria are defined by clinicians, but
    computable phenotyping means translating those criteria into
    computable code to identify massive numbers of candidates
    quickly.
    … But there are big difference between datasets from different
    institutions.
    … In an ont perfect work, I could pull all patients related to
    COVID-16, but that's not current reality.
    … What would it take to write one script that would work in
    multiple institutions and result in consistent and accurate
    cohort?
    … This is where I brought in FHIR RDF and SNOMED and other data
    models.
    … Without ont we need to use exact code matches. This is
    especially bad with ICD-9.

    emily: Opioid triplestore project investigated patterns among
    patients that had surgery and got opioids. Wanted to find out
    what might lead to dependence on opioids.
    … Med data is very hard to deal with in EHRs. The EHR mainly
    only tells you that a med was prescribed, not if the pt picked
    it up or even took it or how much.
    … We tried to join insurance data to at least find out if the
    pt picked up the prescription. Also wanted to find out if any
    pts got their meds from outside UNC network.
    … But ins data looks very different from EHR data. Brought in
    FHIR RDF.
    … Selected a cohort, converted to FHIR R4, then converted that
    to RDF.
    … Then did the same thing w ins data. Needed a custom
    conversion to FHIR (using python).
    … Also linked in some public data, such as unemployment data in
    the pt's region.
    … This allowed us to find hundreds more patients that we
    otherwise could.
    … Determined that it was worth the effort.

    sajjad: What was the value prop to going into RDF?

    emily: Because we wanted to bring in public datasets, in
    addition to the ins and EHR data, RDF gave us a common
    denominator.

    brad: I don't think you could have done this without RDF,
    because you need inference also.

    darrell: Did you use SPARQL? Emily: Yes.

    darrell: Papers? Emily: Yes, in submission, but will also share
    the submitted work.

    sajjad: When converting to RDF, inference comes to mind. For
    public datasets, for maintainability, this was done under R4,
    when changes come you'll need some tuning. But also the public
    datasets might make changes also. Scalability? Maintenance
    issues?

    emily: Don't know yet, because this was a one-time pilot.
    … We tried to build the triplestore in a way that allowed is to
    easiliy rebuild it.

    sajjad: Another motivating factor for RDF, might be that you
    could use sameAs relations instead of refurbishing the whole
    thing.

    guoqian: You have FHIR RDF and non-FHIR RDF data. How did you
    link them?

    emily: Had to make custom predicates to link them. When we did
    the project I was very new to RDF. Needed a relation between
    census tract and the county where the pt lives. Some datasets
    are at either level. Wanted to infer county based on census
    tract, so we build custom relationships.

    darrell: HL7 defined CQL language that maps to FHIR, for
    quality measures. Wondering if they can be expressed in FHIR
    RDF.

    guoqian: Previously looked at translating CQL to SPARQL.

    Paper from Guoqian on CQL and SPARQL: [7]http://
    www.swat4ls.org/wp-content/uploads/2017/11/
    SWAT4LS-2017_paper_40.pdf

       [7] 
http://www.swat4ls.org/wp-content/uploads/2017/11/SWAT4LS-2017_paper_40.pdf

    emily: Second project was an extension of the first.
    … Wanted to see if adding SNOMED ont.
    … Looked at depreseeion and rheumatoid arthritis. Wanted to
    compare the coverage of computable phenotype defined by ICD0-10
    codes vs using SNOMED or HPO ont.
    … But we could not interview patients to find out their actual
    results, so we ended up measureing the degree of overlap.
    … But SNOMED to ICD-10 mapping files are not machine
    processable.
    … Also, for ICD-10 there are a lot of required digits, but
    SNOMED puts in xx?, which will not directly match.
    … We ended up removing the concept mappings because we couldn't
    use them, and used the xx? rules.
    … Ended up having a lot of SNOMED codes mapped from a single
    ICD-10 code.
    … Used OWL reasoning, added HPO ontology, then ran SPARQL
    queries. Some queries looked for exact matches, others used
    inference.
    … SNOMED and ICD-10 have very different ideas of what codes
    constituted depression. Best cohort might be the superset of
    cohorts found by both techniques.
    … For fheum arth we discovered an error in the SNOMED ont, due
    to a missing subclass relation. They were missing knees,
    wrists, hips and ankles!
    … Overall utility of this work is to help inform researchers
    about their choice of codes to use.

    guoqian: For triplestore, what codes do you use?

    emily: We had to use the mappings, because most EHRs do not use
    SNOMED -- they use ICD-10. But that's a huge limitation,
    because the phenotype can only be as good as the ICD-10 code.

   Desired R5 features

    [8]https://github.com/w3c/hcls-fhir-rdf/issues/69

       [8] https://github.com/w3c/hcls-fhir-rdf/issues/69

    brad: the long property names were a problem for us, because we
    could not go beyond 4 levels.

    [9]https://github.com/w3c/hcls-fhir-rdf/issues/75

       [9] https://github.com/w3c/hcls-fhir-rdf/issues/75

    david: Harold drafted a list of things we may want to change in
    FHIR RDF R5, but he's on vacation and I have not been able to
    find it!
    ADJOURNED
Received on Friday, 23 July 2021 00:34:12 UTC