Re: June Change Proposal: text on de-identification from Dan Auerbach on 2013-06-24 (public-tracking@w3.org from June 2013)

From: Dan Auerbach <dan@eff.org>
Date: Sun, 23 Jun 2013 23:36:07 -0700
To: public-tracking@w3.org
Message-ID: <51C7E8D7.50601@eff.org>

Apologies for my whack mail client's formatting woes.

On 06/23/2013 11:27 PM, Dan Auerbach wrote:
> *
>
> I propose the following**for either a two state de-identification
> regime, or "yellow" if we have three states.
>
> Normative text:
>
> Data can be considered de-identified if it has been deleted, modified,
> aggregated, anonymized or otherwise manipulated in order to achieve a
> reasonable level of justified confidence that the data cannot
> reasonably be used to infer information about, or otherwise be linked
> to, a particular user, user agent, or device.
>
>
> Non-normative text:
>
> Example 1. Hashing a pseudonym such as a cookie string does NOT
> provide sufficient de-identification for an otherwise rich data set
> such as a browsing history, since there are many ways to re-identify
> individuals based on pseudonymous data.
>
> Example 2. In many cases, keeping only high-level aggregate data, such
> as the total number of visitors of a website each day broken down by
> country (discarding data from countries without many visitors) would
> be considered sufficiently de-identified.
>
> Example 3. Deleting data is always a safe and easy way to achieve
> de-identification.
>
> Remark 1. De-identification is a property of data. If data can be
> considered de-identified according to the “reasonable level of
> justified confidence” clause of (1), then no data manipulation process
> needs to take place in order to satisfy the requirements of (1).
>
> Remark 2. There are a diversity of techniques being researched and
> developed to de-identify data sets, and companies are encouraged to
> explore and innovate new approaches to fit their needs.
>
> Remark 3. It is a best practice for companies to perform “penetration
> testing” by having an expert with access to the data attempt to
> re-identify individuals or disclose attributes about them. The expert
> need not actually identify or disclose the attribute of an individual,
> but if the expert demonstrates how this could plausibly be achieved by
> joining the data set against other public data sets or private data
> sets accessible to the company, then the data set in question should
> no longer be considered sufficiently de-identified and changes should
> be made to provide stronger anonymization for the data set.*

Received on Monday, 24 June 2013 06:36:37 UTC