Re: June Change Proposal: text on de-identification

Apologies for my whack mail client's formatting woes.

On 06/23/2013 11:27 PM, Dan Auerbach wrote:
> *
> I propose the following**for either a two state de-identification
> regime, or "yellow" if we have three states.
> Normative text:
> Data can be considered de-identified if it has been deleted, modified,
> aggregated, anonymized or otherwise manipulated in order to achieve a
> reasonable level of justified confidence that the data cannot
> reasonably be used to infer information about, or otherwise be linked
> to, a particular user, user agent, or device.
> Non-normative text:
> Example 1. Hashing a pseudonym such as a cookie string does NOT
> provide sufficient de-identification for an otherwise rich data set
> such as a browsing history, since there are many ways to re-identify
> individuals based on pseudonymous data.
> Example 2. In many cases, keeping only high-level aggregate data, such
> as the total number of visitors of a website each day broken down by
> country (discarding data from countries without many visitors) would
> be considered sufficiently de-identified.
> Example 3. Deleting data is always a safe and easy way to achieve
> de-identification.
> Remark 1. De-identification is a property of data. If data can be
> considered de-identified according to the “reasonable level of
> justified confidence” clause of (1), then no data manipulation process
> needs to take place in order to satisfy the requirements of (1).
> Remark 2. There are a diversity of techniques being researched and
> developed to de-identify data sets, and companies are encouraged to
> explore and innovate new approaches to fit their needs.
> Remark 3. It is a best practice for companies to perform “penetration
> testing” by having an expert with access to the data attempt to
> re-identify individuals or disclose attributes about them. The expert
> need not actually identify or disclose the attribute of an individual,
> but if the expert demonstrates how this could plausibly be achieved by
> joining the data set against other public data sets or private data
> sets accessible to the company, then the data set in question should
> no longer be considered sufficiently de-identified and changes should
> be made to provide stronger anonymization for the data set.* 

Received on Monday, 24 June 2013 06:36:37 UTC