- From: Dan Auerbach <dan@eff.org>
- Date: Sun, 23 Jun 2013 23:36:07 -0700
- To: public-tracking@w3.org
- Message-ID: <51C7E8D7.50601@eff.org>
Apologies for my whack mail client's formatting woes. On 06/23/2013 11:27 PM, Dan Auerbach wrote: > * > > I propose the following**for either a two state de-identification > regime, or "yellow" if we have three states. > > Normative text: > > Data can be considered de-identified if it has been deleted, modified, > aggregated, anonymized or otherwise manipulated in order to achieve a > reasonable level of justified confidence that the data cannot > reasonably be used to infer information about, or otherwise be linked > to, a particular user, user agent, or device. > > > Non-normative text: > > Example 1. Hashing a pseudonym such as a cookie string does NOT > provide sufficient de-identification for an otherwise rich data set > such as a browsing history, since there are many ways to re-identify > individuals based on pseudonymous data. > > Example 2. In many cases, keeping only high-level aggregate data, such > as the total number of visitors of a website each day broken down by > country (discarding data from countries without many visitors) would > be considered sufficiently de-identified. > > Example 3. Deleting data is always a safe and easy way to achieve > de-identification. > > Remark 1. De-identification is a property of data. If data can be > considered de-identified according to the “reasonable level of > justified confidence” clause of (1), then no data manipulation process > needs to take place in order to satisfy the requirements of (1). > > Remark 2. There are a diversity of techniques being researched and > developed to de-identify data sets, and companies are encouraged to > explore and innovate new approaches to fit their needs. > > Remark 3. It is a best practice for companies to perform “penetration > testing” by having an expert with access to the data attempt to > re-identify individuals or disclose attributes about them. The expert > need not actually identify or disclose the attribute of an individual, > but if the expert demonstrates how this could plausibly be achieved by > joining the data set against other public data sets or private data > sets accessible to the company, then the data set in question should > no longer be considered sufficiently de-identified and changes should > be made to provide stronger anonymization for the data set.*
Received on Monday, 24 June 2013 06:36:37 UTC