- From: Dan Auerbach <dan@eff.org>
- Date: Sun, 23 Jun 2013 23:27:27 -0700
- To: public-tracking@w3.org
- Message-ID: <51C7E6CF.10806@eff.org>
* I propose the following**for either a two state de-identification regime, or "yellow" if we have three states. Normative text: Data can be considered de-identified if it has been deleted, modified, aggregated, anonymized or otherwise manipulated in order to achieve a reasonable level of justified confidence that the data cannot reasonably be used to infer information about, or otherwise be linked to, a particular user, user agent, or device. Non-normative text: Example 1. Hashing a pseudonym such as a cookie string does NOT provide sufficient de-identification for an otherwise rich data set such as a browsing history, since there are many ways to re-identify individuals based on pseudonymous data. Example 2. In many cases, keeping only high-level aggregate data, such as the total number of visitors of a website each day broken down by country (discarding data from countries without many visitors) would be considered sufficiently de-identified. Example 3. Deleting data is always a safe and easy way to achieve de-identification. Remark 1. De-identification is a property of data. If data can be considered de-identified according to the “reasonable level of justified confidence” clause of (1), then no data manipulation process needs to take place in order to satisfy the requirements of (1). Remark 2. There are a diversity of techniques being researched and developed to de-identify data sets, and companies are encouraged to explore and innovate new approaches to fit their needs. Remark 3. It is a best practice for companies to perform “penetration testing” by having an expert with access to the data attempt to re-identify individuals or disclose attributes about them. The expert need not actually identify or disclose the attribute of an individual, but if the expert demonstrates how this could plausibly be achieved by joining the data set against other public data sets or private data sets accessible to the company, then the data set in question should no longer be considered sufficiently de-identified and changes should be made to provide stronger anonymization for the data set.*
Received on Monday, 24 June 2013 06:27:55 UTC