W3C

Drug Safety and Efficacy Note on CDISC's Study Data Tabulation Model (SDTM)

W3C HCLS Interest Group Note 01 11 2007

This version:
http://www.w3.org/2001/sw/hcls/NOTE_DSE_20071108
Latest version:
http://www.w3.org/2001/sw/hcls/NOTE_DSE_20071108
Editors:
Eric Neumann, Clinical Semantics Group
Authors and Contributors:
see Acknowledgments

Abstract

CDISC's Study Data Tabulation Model (SDTM) is used to define the study components in terms of domains and observations for a given clinical trial study. However, the ability to use it for sets of biomarkers that serve to define surrogate endpoints and/or evidence of mechanism is not currently not possible / or not well described. We intend to propose an augmentation for the SDTM model using RDF-OWL that will support the inclusion of biomarker data and genotyping from subjects, associated with known mechanisms and endpoint descriptors.

Task Force Charge

This HCLSIG task force focuses on the topic of “applying semantics to R&D Informatics efforts in support Drug Safety and Efficacy” within clinial trials, as well as post-market surveillance. We also intend to demonstrate how Semantic Web standards can be applied to issues related to these in the near-term. Specifically, the task force focuses on the following areas for scenarios and activities: Identify/address challenges and needs regarding Biomarkers and Pharmacogenomics in coordination with FDA guidelines Semantic applications around Drug Safety: Signals and Notification Possible applications of Semantic Web in Clinical Trial planning, management, analysis, and reporting (e.g., EDC and EHR Single-Source, data security, integrity) Facilitating electronic submissions as per the Common Technical Document (eCTD) specifications, http://www.fda.gov/cder/guidance/7087rev.htm ) X Use Cases document to illustrate, in detail, the techniques XX provides for associating documents with appropriate instructions for extracting any embedded data.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a Interets Group Note, developed by the Healthcare and Life Sciences Interest Group.

As of the publication of this Interest Group Note the HCLS Interest Group has completed work on this document. Changes from the previous Working Draft are indicated in a log of changes. Comments on this document may be sent to public-semweb-lifesci@w3.org. Further discussion on this material may be sent to the Semantic Web Interest Group mailing list, semantic-web@w3.org (also with public archive).

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Table of Contents

Introduction

This HCLSIG task force (DSE) focuses on the topic of “applying semantics to R&D Informatics efforts in support Drug Safety and Efficacy” within clinical trials, as well as post-market surveillance. We also intend to demonstrate how Semantic Web standards can be applied to issues related to these in the near-term. Specifically, the task force focuses on the following areas for scenarios and activities [HCLS].

Background

Digital data from both Non-clinical (animal) and Clinical Studies needs to be organized according to the following areas:

The tabular mode proposed by SDTM allows defining the observation forms and codes, but is constrained for wide usage by several factors. Specifically, it needs a more precise way of describing codes (ala URIs), and supporting optional and required extensions that are dependent on certain classes of studies. SDTM needs to be extended using a flexible mode to incorporate key elements of translation medicine. This means the inclusion of biomarker and genotype informationa must be efficiently (multiple sets of diverse measurements per subject per study) and scientifically (molecular, mechanistic, and phenotypic associations) addressed.

Task Objectives

The objectives focused on a few key items related to the SDTM model and possible extensions to it: Develop and document Scenarios for some of the above identified areas Identify and validate some initial Best Practices for handling safety and efficacy information through semantics, which incorporate current vocabulary conventions Create one or more public Semantic Web-based Demonstrations (see Clinical Trial Demo) Coordination and collaboration with relevant organizations, possibly CDISC, ICH, HL7-RCRIM, EMEA, FDA, NCI-caBIG

The requirement to convert ODM/XML to RDF may not be approach the problem by addressing SDTM elements; data + metadata , codelists and definitions embedded in one study, instead use references to metadata and defs.

Rationale

The use of information to improve the development of Efficacious and Safe Drugs rests on the proper and timely utilization of diverse information sets, and the adoption and compliance of well-defined policies. As information becomes more diverse and policies more central to the pharmaceutical industry, the development of information systems that are better suited to handle multiple information types (data and ontologies) while complying with defined policies (rules and actions) will become essential. Semantic Web technology standards offer potential solutions for: Aggregating Study Datasets, around Biomarkers (and following eCTD guidelines) Enhancing management of non-clinical and clinical controlled vocabularies that will be certainly expanding and evolving (adaptability) Providing fast access to current safety information though semantic-enabled channels (Pharmacovigilance) Applying Rules, Integrity, and Security in support of policy compliance and management (HIPAA, CFR21Part11 and Sarbanes-Oxley)

Use-Case Context

The Study Data Tabulation Model (SDTM) is used to define the study components in terms of domains and observations for a given clinical trial study. However, the ability to use it for sets of biomarkers that serve to define surrogate endpoints and/or evidence ofd mechanism is not currently possible. We intend to propose an augmented SDTM model using RDF-OWL that will support the inclusion of biomarker data from subjects, associated with known mechanisms and endpoint descriptors.

Scenario 1: Genetic Diagnostic CT

  1. Phase IIb Clinical Trial Design that includes Amplichip CYP 450 and Colon Cancer Diagnostic Chips, used to identify
  2. which CYP alleles present for drug metabolism; used to screen candidates for placement in different CT arms linkage analysis for colon cancer contributors; for population segmentation and responder analyses
  3. Candidate samples (for CYP-allele based recruitment) at last candidate screening visit
  4. Study Samples at second visit taken for colon marker analysis and possible later genome-wide analysis
  5. EDC on a CT study that will use genotype (biomarkers) to track potential tox signals
  6. Raw data SDTM bundling and linking of observations with genotype (from GeneChip data)
  7. Analysis of SDTM data between interventions and outcomes (and genertation of ADaM ); statistics, correlation, and association
  8. Final bundling of analyzed data and CT interpretations
  9. Storage of Clinical findings for clinical mining by other investigations

Example in N3 of SDTM Model with context extensions

The following examples are work in progress (collaborative whiteboard) of how to define and organize clinical data ala the SDTM model using an RDF approach. N3 is being used here to make editing and comprehension easier. Some basic syntactical rules are reviewed here:


	@prefix cdisc: <http://www.cdisc.org/sdtm/vocab> . 
	@prefix dse: <http://www.w3.org/2001/sw/hcls/dse> . 
	@prefix nci: <http://nci.nih.gov/cadsr/vocabulary> . 
	@prefix nist: <http://nist.gov/units> . 
	@prefix time <http://www.w3.org/2006/time> . 

	//  Sex Text Code: 'MALE', 'FEMALE', 'UNKNOWN', 'Intersex'

	<http://clinic.com/study/T2271>   
	            a cdisc:Study ;
	            cdisc:subject <http://clinic.com/study/T2271/subject/S83221> ;
	            cdisc:subject <http://clinic.com/study/T2271/subject/S74343> ;
	        ...   .

	<http://clinic.com/study/T2271/subject/S83221> 
	            a cdisc:Subject ;
	            nci:sex_code   nci:Female ;
	     //  here I assume cdisc:Diastolic_BP is a subproperty of cdisc:VSTest --
	            cdisc:observation <http://clinic.com/study/T2271/subject/S83221/observation/O6622> ;
	            cdisc:observation <http://clinic.com/study/T2271/subject/S83221/observation/O6561> ;
	    ...   .

	<http://clinic.com/study/T2271/subject/S83221/observation/O6622>
	        a cdisc:Diastolic_BP ;  
	        cdisc:obs_context  cdisc:patient_lying ;
	        cdisc:obs_value  "98" ;
	        cdisc:obs_units  nist:mmHg .

	<http://clinic.com/study/T2271/subject/S83221/observation/O6561>
	        a cdisc:Pulse ;         
	        cdisc:obs_context  cdisc:patient_lying ;
	        cdisc:obs_value  "64";
	        cdisc:obs_units  nist:bpm .	

Bundling Diastolic_BP, Systolic_BP, and Pulse together under one observation could be done in the following way...


	<http://clinic.com/study/T2271/subject/S83221/observation/O6622>
	        a cdisc:Vital_sign ;   // cdisc:Vital_sign is a subclass of cdisc:Observation
	        cdisc:obs_context  [ cdisc:position cdisc:patient_lying ; cdisc:note cdisc:patient_fainted ; cdisc:patient_status cdisc:non_critical . ] ;

	        cdisc:diastolic [ a cdisc:Diastolic_BP ;  
	           cdisc:obs_value  "98" ;
	           cdisc:obs_units  nist:mmHg .
	        ] ;   
	        cdisc:systolic [ a cdisc:Systolic_BP ;  
	           cdisc:obs_value  "152" ;
	           cdisc:obs_units  nist:mmHg .
	        ] ;   
	        cdisc:pulse [ a cdisc:Pulse ;         
	           cdisc:obs_value  "64";
	           cdisc:obs_units  nist:bpm . 
	        ] .
	
Example Based on simulated Clinical Data from Stephen Dobson

	<http://clinic.com/study/T2271/subject/4183542663506> 
	            a cdisc:Subject ;
	            nci:sex_code   nci:Female ;
	            cdisc:treatment <http://clinic.com/study/T2271/subject/4183542663506/observation/O2241> ;
	            cdisc:vitalSigns <http://clinic.com/study/T2271/subject/4183542663506/observation/O6561> ;
	            cdisc:adverseEvent <http://clinic.com/study/T2271/subject/4183542663506/observation/O6622> ;


	// ROUTE        DRGGROUP        DOSE    pid     treatment       tpfday tptday
	// IV   B       7 MG    4183542663506   7mg then 14mg SEMWEB 6/11/84 7/11/84
	<http://clinic.com/study/T2271/subject/S83221/observation/O2241 >
	        a cdisc:Treatment ;   // cdisc:Treatment is a subclass of cdisc:Observation
	        cdisc:design_arm  <http://clinic.com/study/T2271/treated_B/double_dose> ;
	        dse:route cdisc:IV_route ;
	        dse:drug_group "B" ;
	        cdisc:dose "7" ;
	        cdisc:dose_units nist:mg ;
	        cdisc:treatment "7mg then 14mg SEMWEB" ;
	        cdisc:first_date "6/11/84" ;
	        cdisc:term_date "7/11/84" ;

	... // How best to define Treatments and Experimental Design ? using cdisc:design_arm to link back to design graph?


	// VTLTEXT      VTLRES  VISIT_ID        pid     collday related
	// Standing Diastolic BP (mmHg) 75      BASELINE 4183542663506 6/11/84 1
	<http://clinic.com/study/T2271/subject/S83221/observation/O6561 >
	        a cdisc:Vital_sign ;   // cdisc:Vital_sign is a subclass of cdisc:Observation
	        cdisc:visit_id cdisc:BASELINE ;
	        cdisc:visit_date "6/11/84" ;
	        dse:obs_context  [ cdisc:position cdisc:patient_standing  . ] ;

	        cdisc:diastolic [ a cdisc:StandingDiastolic_BP ;  
	           dse:vtltext  "Standing Diastolic BP (mmHg)" ;
	           dse:related_measure  "1" ;
	           dsecdisc:obs_value  "75" ;
	           dse:obs_units  nist:mmHg .
	        ] .   


	// pid  AEFDAY  AETDAY  AESEV   AESEVT  AESER   AESERT PREFTEXT BODYTEXT
	// 4183542663506        6       9       2       MODERATE        2       NO ABDOMEN ENLARGED BODY AS A WHOLE
	<http://clinic.com/study/T2271/subject/S83221/observation/O6622 >
	        a cdisc:Adverse_Event ;   // cdisc:Adverse_Event is a subclass of cdisc:Observation
	        cdisc:visit_id cdisc:BASELINE ;
	        time:first_date "6" ;
	        time:term_date "9" ;
	        time:duration_days "2" ;
	        dse:severity AE:MODERATE ;
	        dse:rating "2" ;
	        dse:RT "NO" ;
	        dse:prefText "NO ABDOMEN ENLARGED" ;
	        dse:bodyText "BODY AS A WHOLE" ;
	        dse:obs_context  [ cdisc:position cdisc:patient_standing  . ] .
		

Discussion regarding CDISC SDTM Code lists

Below, in the related resources section, two examples are attached on how the current XML output looks like from CDISC usage of NCI caDSR for the so called SDTM Controlled Terminologies. These inlude the permissable values as strings to be incorporated in SDTM datasets, e.g:

It is important to recognise the different approaches in 1) CDISC SDTM standard, and in 2) NCI Thesaurus and in what I could like to see as 3) Observation Types Ontologies, see more details below. And how to relate these to existing terminologies such as LOINC codes and Clinical Findings in SNOMED CT.