Fwd: SDTM and RDF issues and ideas

Pharmacogenomics attachments have been uploaded to the DSE WIKI.

Eric


Begin forwarded message:


	From: Kerstin.L.Forsberg@astrazeneca.com
	Date: November 13, 2006 2:31:13 PM EST
	To: wayne.kubick@lincolntechnologies.com, eneumann@teranode.com
	Cc: Bo.H.Andersson@astrazeneca.com
	Subject: SDTM and RDF issues and ideas

	Hi Wayne and Eric,
	last week Wayne and I did met F2F for some other discussions, but we managed
	to have some quick words around SDTM and RDF. 
	I promised Wayne to pull together some of the issues and ideas directly or
	indirectly related to our semantic approach for clinical data. If I
	understood Wayne correctly Eric and Wayne will meet F2F soon so Eric can
	give you the much more in-depth view on what we try to achieve from a more
	scientific perspective.

	Some background
	CDISC provides "... a general framework for describing the organization of
	information collected during human and animal studies. The model is built
	around the concept of observations, which consist of discrete pieces of
	information collected during a study."  So far, the model has been applied
	for 20+ different so called domains for groups of observations like Vital
	Signs, Microbiology, Pharmacokinetic, and more to come. With the fundamental
	model behind SDTM and SEND, data representing observations on subjects in
	clinical studies can be specified even though CDISC has not yet provided a
	data exchange domain for it.  
	CDISC have so far only considered implementation of the fundamental model
	for tabulated datasets for exchange per clinical study (as SAS format, and
	later on in XML using CDISC's general messaging format ODM). 
	In the Drug Safety and Efficacy task force we will plan to put a semantic
	web perspective on the model and propose an open ended RDF implementation to
	ensure that data representing observations become recombinant cross clinical
	studies.
	References:
	Task force on Drug Safety and Efficacy (DSE), part of W3C's interest
	group for Health Care and Life Science
	http://esw.w3.org/topic/HCLSIG/Drug_Safety_and_Efficacy
	Eric's slide at the International Semantic Web conference last week 

	
	

	http://esw.w3.org/topic/HCLS/ISWC/Workshop/Abstracts?action=AttachFile&do=ge
	t&target=DSE+Summary.ppt

	Unique identifiers
	A key requirement of the semantic approach is to assign and use global,
	persistent and unique identifiers for information resources, key business
	entities and conceptual resources. This is one reason for my many emails to
	Margaret Haber and Bron Kisler asking for URI:s to be assigned for the
	information artefacts, such as the data elements and value domains and their
	different versions, published in the CDISC context of NCI's caDSR.
	Also, I think it would be interesting to discuss how a sensible
	identification schema could look like for individual observation records as
	our main type of information resources. SDTM's approach is to identify them
	per study, dataset, subject, seq.no. 

	Types of Observations ontologies?
	The SDTM is not a standard specifying the types of clinical observations to
	be collected, derived, exchanged etc. in clinical studies. Instead it list a
	set of permissible qualifiers and lists of terms, so called  __TESTCD, to
	indicate preferred names for the different types (see also below) per
	domain. 
	Would it make sense to go beyond such basic lists of terms and specify for
	example an ontology for Pharmacogenomics Results as "prescriptive metadata"
	? 
	Wayne, what is the status of the PG domain? I have an old proposal and some
	examples from last year but can't find any new information on it.
	 <<CDISC-PG-Domain-V4-OCT-3-2005.doc>>  <<CDISC PG-Testdata Findings
	10_03_2005_JH.xls>> 

	Non drug related Interventions and Events with non-categorical values 
	We had problems when we applied SDTM's model/variables for the general
	classes for new domains. As the current general model/variables do not
	support non drug related Interventions (e.g. Surgery), nor do the general
	model/variables support Events with non-categorical values (e.g. Bleeding). 
	In my discussions with Julie E. and Dan G. the discussions has been focused
	on the forthcoming Surgery domain, However, it is important to note that the
	problem is more general and the alternative to "through more things in the
	SUPPQUAL sink" is not a feasible solution. Instead I think the more
	extensible solution that has been adopted for observations classified as
	Findings would be applicable also for observations classified as Events and
	Interventions . As the value of a Finding can be either continues or
	categorical and the SDTM defined result variables, together with the
	codelist ref in define.xml, can be used to exchange both kinds of values and
	their value domains (i.e. unit for continues values and controlled
	terminology for categorical values). 


	__TESTCD, what should they actually refer to?

	Attached you will find two slides describing the super-/sub-concept
	structures in NCI Thesaurus for the concepts that have been assigned as
	CDISC's names for Vital Signs. 
	 <<CDISC's VSTEST Names in NCI's caDSR and EVS.ppt>> 

	These slides show that the Vital Signs concepts do have different
	super-concepts in NCI Thesaurus. For example:
	- The VSTEST Name PULSE has been aligned the to the new concept Pulse Rate
	as a subconcept to Observation, instead of using the existing concept Pulse
	as a subconcept of Personal Attribute. 
	- While the VSTEST Name DIABP has been aligned to the existing concept
	Diastolic Pressure as a subconcept to Personal Attribute and not to a new
	subconcept of Observation.

	What do we actually want to refer to with VSTEST Names, Lab Test Codes as
	well other __TESTCD:s? 

	1) To the real world act of observing for example diastolic blood pressure?

	2) To the real world phenomenon of for example blood pressure that exists
	entirely independently of instruments or units of measure or acts of
	observation?

	3) To the structure of the information recorded during an action of
	observing for example diastolic blood pressure?
	3a) The single variable for the observered value as it was recorded
	originally on a CRF page.
	3b) The specification for each type of observation that combines the data
	element for the observation value together with data elements for
	observation timing, the related qualifiers and required reference
	information that makes the diastolic blood pressure value "interpretable and
	interoperable cross studies and over time". 
	3c) SDTM's standardised data structures for dataset records for exchange of
	groups of different types of observations per study. 

	4) To the concept that we have an agreed upon term and definition of, and
	how we lexically relate this term to others (broader/narrower terms,
	synonyms etc.). 

	(See an interesting posting on the relationship between these different
	aspects in the HL7 Watch Blog
	http://hl7-watch.blogspot.com/2006/02/is-there-difference-between-person-and
	.html and a paper on the different "worlds" of Concept Systems, Ontologies
	and Information Models
	http://ontology.buffalo.edu/concepts/ConceptsandOntologies.pdf )






Eric Neumann, PhD
co-chair, W3C Healthcare and Life Sciences,
and Senior Director Product Strategy
Teranode Corporation
411 1st Avenue South, Suite 700
Seattle, WA 98104
+1 (781)856-9132
www.teranode.com 

Received on Friday, 17 November 2006 03:24:14 UTC