W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > November 2011

Re: Aggregating Epidemiological Study Findings

From: Steven Bedrick <bedricks@ohsu.edu>
Date: Wed, 30 Nov 2011 09:51:43 -0800
To: Jodi Schneider <jodi.schneider@deri.org>
CC: "Bulusu, Vijay" <Vijay.Bulusu@pfizer.com>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Message-ID: <D1C263CB-7081-4B88-809C-5C4A3CBE4B68@ohsu.edu>
> It seems like a reasonable use case!

I'm actually working on a project that's doing basically this (indexing and aggregating data abstracted from single-visit and multi-site epi studies), and I agree that this is a great use case. Right now, we're using a relational data model, but I am firmly convinced that doing this the "Right Way" would require the flexibility and richness of an ontology, simply because of how complex the data is, both in terms of how it should be represented as well as in terms of the complexities involved in its aggregation. I'm working with a team of systematic reviewers and epidemiologists, and so far, pretty much every data element that we've added to the system has some sort of bearing on whether and how one goes about aggregating data, which means that any system trying to do this in a generalizable way has to have some way of encoding *that* knowledge, as well (e.g., "data points with attributes X, Y, and Z are aggregated thusly, whereas data points with X, Y, and A are aggregated in some other way"). 

> It seems like you'd need to identify the factors of interest--e.g. disease, selection criteria, research questions--and aggregate on those. Someone who actually does metaanalyses would be more aware of what factors are relevant/important.


In case anybody out there is thinking of doing this, here's some of what we store in our system (in addition to the data elements you've identified above, all of which are also relevant and included in our system):
	- study design (which we model using several different attributes- prospective/retrospective, case-control/cohort, etc.), 
	- study setting (geographic location, hospital/outpatient, some data about who the study population was (military, pediatric, etc.), plus a fair bit of domain-specific stuff that's related to the particular medical topic with which we're working)
	- observation time points (both fixed ("we measured the prevalence of symptom X at Y days" as well as date ranges ("we measured the incidence of disease X between Y and Z days", sometimes reported as means with standard deviations or confidence intervals instead of explicit time points))
	- whether a given observation was a mean, a proportion, something else, with confidence intervals, without confidence intervals, sometimes with sample size (sometimes broken out by treatment and control group status, often for multiple treatment groups), 
	- etc. etc. etc... and down the rabbit hole we go, and thus far I've only talked about the different kinds of metadata we have to store- not even about the data itself that we wanted to aggregate! :-)

In spite of all of this, it's a really great domain to be working in- there's a ton of low-hanging data management fruit out there for systematic reviewers. The ones I work with (at a major evidence-based practice center that does tons of AHRQ and USPSTF reviews) basically live in EndNote and Microsoft Word, and use those as their data management platform. Coming up with tools to help them work more effectively is really satisfying- I wish I'd had a camera running the day I told them that they could all use the system to enter data simultaneously (instead of having to keep track of who had the EndNote file open at any given time). They were like kids at Christmas...

-SB
Received on Wednesday, 30 November 2011 17:52:17 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:01:04 GMT