RE: Project with D2R from Kashyap, Vipul on 2008-06-25 (public-hcls-coi@w3.org from April to June 2008)

From: Kashyap, Vipul <VKASHYAP1@PARTNERS.ORG>
Date: Tue, 24 Jun 2008 22:38:02 -0400
To: "John Madden" <madden.jf@gmail.com>, "Eric Prud'hommeaux" <eric@w3.org>
Cc: <public-hcls-coi@w3.org>
Message-ID: <DBA3C02EAD0DC14BBB667C345EE2D12402E8C0D1@PHSXMB20.partners.org>
Hi John,

Thanks for putting out this interesting description. Some comments/questions are
included 
inline below.

> Our use case is also Clinical Trials. The scenario we're 
> targeting is,  
> you have one or more Clinical Trials Management Organizations 
> (CTMOs),  
> each managing one or more clinical trials, enrolling patients at  
> multiple clinical sites. At any given site, multiple trials and  
> multiple CTMOs may be active simultaneously or sequentially. A given  
> patient may participate simultaneously or sequentially in more than  
> one trial.

An interesting use case that is relevant to the COI project is when a patient
Participating in a clinical trial visits his physician.

> Each trial has its own data requirements, that consists about 80% of  
> participant data that will get prospectively collected in the course  
> of the trial using dedicated, trial-specific data entry instruments  
> (trial-specific forms, screens, etc.). But about 20% of the data is  
> retrospective patient data needed to establish baseline demographics  
> and "pre-existing conditions" (which is clinical trials lingo for  
> "what other diseases/conditions does the patient have?") etc. For a  
> given patient, this data would be pretty much the same from trial to  
> trial.

How much of this data is potentially retrievable from an EMR system?

> This retrospective data "lives" in existing electronic 
> medical records  
> stored in existing clinical data stores (almost always 
> relational DBs  
> that are the backends to the clinic/hospital electronic 
> medical record  
> (EMR) application/system). Mostly, these have proprietary table  
> designs that are specific to the EMR vendor. Although these designs  
> may sometimes exploit standard coding systems as table keys, just as  
> often they use entirely vendor/system-local keys.

If the data in an EMR is not reused or shared with the CTMS (and vice versa),
what could be the potential value of this data living in the EMR?

> But wouldn't it be much nicer if a trial administrator for a given  
> CTMO could treat all the site-specific databases as if they were a  
> single large database with a  common "virtual schema", and 
> then he/she  
> could formulate the ETL scenario just once (it wouldn't be 
> ETL anymore  
> then- it would just be a query). Even better, if all the 
> CTMO's could  
> form a consortium and agree to treat all the site-specific databases  
> according to a single, shared "virtual schema", they all 
> could benefit.
> 
> Important: This schema would cover only items of specific 
> interest in  
> this application, namely (a) demographic information and (b) pre- 
> existing conditions. Thus, it would be what Vipul called a "niche  
> ontology".

And more importantly, this wouldn't necessarily enable interoperability between
the EMR and CTMS for a wide variety of applications.

> But what to do when the SPARQL hits the clinical sites? You 
> install at  
> each participating site a D2R server that listens for queries. For  
> each site you have created a D2R mapping file covering RDF-to-SQL  
> translation over the fairly limited shared vocabulary of interest  
> (demographics, maybe 50 concepts; pre-existing conditions, maybe 100  
> concepts). Creation of the mapping file is manual, 
> labor-intensive and  
> unique to each site, but only needs to be done once.

The catch is that for any realistic scenario, these mappings can be complex
and as we discussed today, may involve 1-many vocabulary mappings

> Another problem: Some of the pre-existing conditions data is not  
> actually going to be represented in granular form in most clinical  
> DBs; it will more often be embedded someplace in text blobs (e.g.  
> problem lists, discharge summaries, etc.). So you may need to 
> perform  
> some tricks in formulating your SQL queries by incorporating 
> some text  
> searching, pipelining in some natural language processing steps, or  
> worst case maintaining auxiliary full-text indexes on the object  
> database.

The above is very ambitious and beyond the capability of NLP engines.
However, there is significant value to be gained by looking at structured data
such as labs, medications, vital signs, etc.

Cheers,

---Vipul

The information transmitted in this electronic communication is intended only
for the person or entity to whom it is addressed and may contain confidential
and/or privileged material. Any review, retransmission, dissemination or other
use of or taking of any action in reliance upon this information by persons or
entities other than the intended recipient is prohibited. If you received this
information in error, please contact the Compliance HelpLine at 800-856-1983 and
properly dispose of this information.
Received on Wednesday, 25 June 2008 02:38:59 UTC