how SHRINE appeased IRBs from Eric Prud'hommeaux on 2012-02-06 (public-semweb-lifesci@w3.org from February 2012)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 6 Feb 2012 10:10:07 -0500
To: public-semweb-lifesci@w3.org
Message-ID: <20120206151006.GA29133@w3.org>

it's challenging to get IRB approval to poke a patient data; here's how I2B2's SHRINE project managed to get IRB approval on their early demo:
[[
1 There would be no central database. Each hospital would own and manage its data locally and have a local principal investigator responsible for the database.
2 The prototype would only be available for a limited time, after which all data would be destroyed.
3 The local databases at each hospital would include only old data from 2006. After a one-time load, the data would not be refreshed.
4 All patients whose data would be used in the prototype received a HIPAA privacy notice that allows their personal health information to be used for research that has been reviewed and approved by an IRB.
5 The prototype would only allow queries that return aggregate counts of clinical data, such as the total number of patients with diabetes at each health center. No identified data or data collected as part of a research study would be included in this demo.
6 The prototype would obfuscate the aggregate counts by adding a small random number. Thus, the user would see an approximate count of the number of matching patients, not the exact count.9 To make it more difficult for the user to guess the actual number, the prototype would “lock” the user's account if the same query was run multiple times in the same day.
7 If a hospital returned less than ten patients in a query, then “less than 10” would be presented rather than the actual count.
8 An audit of all queries would be logged.
9 In addition to an overall principal investigator for the SHRINE prototype, each hospital would have a local PI who would be responsible for his or her hospital's patient data.

and

1 Individual hospitals could remove their databases from the prototype at any time.
2 Hospitals would not be identified by name in the demo. Instead, the labels “hospital 1”, “hospital 2”, “hospital 3” would be used.
3 For each query, the aggregate counts would be displayed in a random order so that “hospital 1”, for example, would refer to a different institution each time.
4 The aggregate counts would be multiplied by a scale-factor that is inversely proportional to the number of patients at the hospital. Otherwise, PHS, which includes both BWH and MGH would return aggregate counts that were roughly twice as big on average as the other two hospitals.
5 The counts from the three health centers would be displayed simultaneously instead of one at a time in the order in which they are returned by the hospitals. Otherwise, the speed of a local hospital's database, which is dependent on many factors such as the amount of data and types of servers, could be used to identify the health center from which an aggregate count came.
]] — <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2744712/?tool=pubmed>

pretty much all of that could apply to our federation demos. (or we could write SPARQL wrappers around SHRINE query endpoints.)
--
-ericP

Received on Monday, 6 February 2012 15:11:03 UTC