Re: June Change Proposal: text on de-identification

Hi Dan,

I've moved ISSUE-188 to the Compliance June product; I believe that existing issue closely tracks the topic of this change.

I've set up a wiki page for this proposal: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Deidentification
The wiki page also has the text from the editors' draft, for easier comparison.

Thanks,
Nick

On Jun 23, 2013, at 11:27 PM, Dan Auerbach <dan@eff.org> wrote:

> I
>           propose the following for either a two state
>           de-identification regime, or "yellow" if we have three states.
> 
>           
> 
>         
> Normative
>           text:
> 
>           
> 
>           Data can be considered de-identified if it has been deleted,
>           modified, aggregated, anonymized or otherwise manipulated in
>           order to achieve a reasonable level of justified confidence
>           that the data cannot reasonably be used to infer information
>           about, or otherwise be linked to, a particular user, user
>           agent, or device.
> 
> Non-normative
>         text:
> 
>         
> 
>         Example 1. Hashing a pseudonym such as a cookie string does NOT
>         provide sufficient de-identification for an otherwise rich data
>         set such as a browsing history, since there are many ways to
>         re-identify individuals based on pseudonymous data.
> 
>         
> 
>         Example 2. In many cases, keeping only high-level aggregate
>         data, such as the total number of visitors of a website each day
>         broken down by country (discarding data from countries without
>         many visitors) would be considered sufficiently de-identified.
> 
>         
> 
>         Example 3. Deleting data is always a safe and easy way to
>         achieve de-identification.
> 
>         
> 
>         Remark 1. De-identification is a property of data. If data can
>         be considered de-identified according to the “reasonable level
>         of justified confidence” clause of (1), then no data
>         manipulation process needs to take place in order to satisfy the
>         requirements of (1).
> 
>         
> 
>         Remark 2. There are a diversity of techniques being researched
>         and developed to de-identify data sets, and companies are
>         encouraged to explore and innovate new approaches to fit their
>         needs.
> 
>   
> 
>         Remark 3. It is a best practice for companies to perform
>         “penetration testing” by having an expert with access to the
>         data attempt to re-identify individuals or disclose attributes
>         about them. The expert need not actually identify or disclose
>         the attribute of an individual, but if the expert demonstrates
>         how this could plausibly be achieved by joining the data set
>         against other public data sets or private data sets accessible
>         to the company, then the data set in question should no longer
>         be considered sufficiently de-identified and changes should be
>         made to provide stronger anonymization for the data set.

Received on Monday, 24 June 2013 07:38:24 UTC