Re: how SHRINE appeased IRBs (HIPAA rules) from M. Scott Marshall on 2012-02-07 (public-semweb-lifesci@w3.org from February 2012)

From: M. Scott Marshall <mscottmarshall@gmail.com>
Date: Tue, 7 Feb 2012 22:16:24 +0100
To: Sivaram Arabandi <sivaram.arabandi@gmail.com>
Cc: David Booth <david@dbooth.org>, "Eric Prud'hommeaux" <eric@w3.org>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>, john wilbanks <wilbanks@gmail.com>
Message-ID: <CACHzV2OrsxpGRX=gwJTU9xRcenut6MD+s53iMLEah_QNbsnuxw@mail.gmail.com>
Thanks Eric, David, Sivaram,
[CC'd John Wilbanks]

Good stuff! The 'Consent to Research' work is essential to progress
toward clarity and transparency in an area of (global) uncertainty:
patient data use. I think that John Wilbanks/Sage BioNetworks[1] are
doing great work. I understood from John that he/they have also
lobbied the FDA to free up Comparator Arm Data, see [2]. See also [3],
[4].

Sivaram - 'Consent to Research' is also working on Portable Legal
Consent (PLC). I suppose that, in some cases, a PLC could make a
further project specific IRB approval unnecessary. Much like Creative
Commons license classifications, PLC has the potential to lower the
costs of data sharing and increase transparency by creating
recognizable and understood consent policies (i.e. reusable and
well-documented consent policies).

Something I've had on my mind since Mark Wilkinson described it at a
talk at NCBO in 2010: How to represent consent policies and their data
sharing implications (Mark described an ontological application) in
such a way that they can be implemented in SemTech and even delivered
on the Web, such as for a personalized consent form where a patient
can interactively explore the theoretical consequences of various
policies and restrictions.

Here's a scruffy attempt to elaborate using Michael Hausenblas's
diagram (via Danny Ayers):
<Start Scruffy Explanation>

https://plus.google.com/u/0/112609322932428633493/posts/d1oxJ8Nk6a2

To translate the diagram above into the domain of patient data, you
could replace slices/columns with those that represent patient
consent, patient data attributes, IRB guidelines, U.S. policies
(HIPAA), EU policies, etc., then use SPARQL and/or OWL to determine
whether the governing constraints are being met by some part of the
graph. In the context of access control to patient data, articulate
concepts across: patient data attributes, consent, departmental,
institutional, state, national, federal or EU patient data and privacy
policies and laws. Automated access control, reasoning about whether
constraints of various policies are met, etc.
<End Scruffy Explanation>

-Scott

[1] http://weconsent.us/faq#how_is_sage_bionetworks_related
[2] http://www.w3.org/2011/05/HCLSIGUseCases#comp
[3] http://del-fi.org/consent
[4] http://sagecongress.org/WP/2011agenda/groupd/

On Tue, Feb 7, 2012 at 3:40 AM, Sivaram Arabandi
<sivaram.arabandi@gmail.com> wrote:
> Thanks Eric, this is good stuff - something that all of us that have
> dealt with IRB have experienced. Especially painful when multiple
> institutions were involved and often needed multiple data sharing
> agreements to be put in place.
>
> As if this was not enough, if the initial aggregate results were found
> to be useful and the researcher wanted to move forward, then a further
> project specific IRB was necessary for getting the data.
>
> Hoping that things will improve.
>
> Sivaram
> ______________________
> Sivaram Arabandi, MD, MS
>
> Sent from my iPad
>
> On Feb 6, 2012, at 11:04 AM, David Booth <david@dbooth.org> wrote:
>
>> This is an excellent illustration of how damaging HIPAA is to research
>> efforts, and how important it will be to develop *standard* ways to
>> conform to HIPAA requirements and make research data more usable.
>>
>> For example, instead of every hospital or research project creating its
>> own custom consent form (for authorizing the collection of personal
>> health data), we need to *standardize* these consent forms in exactly
>> the way that Creative Commons began standardizing licenses, so that data
>> can be used more broadly and tools can automatically determine what data
>> can be used for what purposes.  John Wilbanks is working on this with
>> the Consent to Research project:
>> http://weconsent.us/
>>
>> David
>>
>> On Mon, 2012-02-06 at 10:10 -0500, Eric Prud'hommeaux wrote:
>>> it's challenging to get IRB approval to poke a patient data; here's how I2B2's SHRINE project managed to get IRB approval on their early demo:
>>> [[
>>>    1 There would be no central database. Each hospital would own and manage its data locally and have a local principal investigator responsible for the database.
>>>    2 The prototype would only be available for a limited time, after which all data would be destroyed.
>>>    3 The local databases at each hospital would include only old data from 2006. After a one-time load, the data would not be refreshed.
>>>    4 All patients whose data would be used in the prototype received a HIPAA privacy notice that allows their personal health information to be used for research that has been reviewed and approved by an IRB.
>>>    5 The prototype would only allow queries that return aggregate counts of clinical data, such as the total number of patients with diabetes at each health center. No identified data or data collected as part of a research study would be included in this demo.
>>>    6 The prototype would obfuscate the aggregate counts by adding a small random number. Thus, the user would see an approximate count of the number of matching patients, not the exact count.9 To make it more difficult for the user to guess the actual number, the prototype would “lock” the user's account if the same query was run multiple times in the same day.
>>>    7 If a hospital returned less than ten patients in a query, then “less than 10” would be presented rather than the actual count.
>>>    8 An audit of all queries would be logged.
>>>    9 In addition to an overall principal investigator for the SHRINE prototype, each hospital would have a local PI who would be responsible for his or her hospital's patient data.
>>>
>>> and
>>>
>>>    1 Individual hospitals could remove their databases from the prototype at any time.
>>>    2 Hospitals would not be identified by name in the demo. Instead, the labels “hospital 1”, “hospital 2”, “hospital 3” would be used.
>>>    3 For each query, the aggregate counts would be displayed in a random order so that “hospital 1”, for example, would refer to a different institution each time.
>>>    4 The aggregate counts would be multiplied by a scale-factor that is inversely proportional to the number of patients at the hospital. Otherwise, PHS, which includes both BWH and MGH would return aggregate counts that were roughly twice as big on average as the other two hospitals.
>>>    5 The counts from the three health centers would be displayed simultaneously instead of one at a time in the order in which they are returned by the hospitals. Otherwise, the speed of a local hospital's database, which is dependent on many factors such as the amount of data and types of servers, could be used to identify the health center from which an aggregate count came.
>>> ]] — <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2744712/?tool=pubmed>
>>>
>>> pretty much all of that could apply to our federation demos. (or we could write SPARQL wrappers around SHRINE query endpoints.)
>>
>> --
>> David Booth, Ph.D.
>> http://dbooth.org/
>>
>> Opinions expressed herein are those of the author and do not necessarily
>> reflect those of his employer.
Received on Tuesday, 7 February 2012 21:22:16 UTC