- From: Nicholas Doty <npdoty@w3.org>
- Date: Mon, 24 Jun 2013 00:38:14 -0700
- To: Dan Auerbach <dan@eff.org>
- Cc: public-tracking@w3.org
- Message-Id: <B3811990-C39A-417B-9294-5ADDB9AEB7F2@w3.org>
Hi Dan, I've moved ISSUE-188 to the Compliance June product; I believe that existing issue closely tracks the topic of this change. I've set up a wiki page for this proposal: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Deidentification The wiki page also has the text from the editors' draft, for easier comparison. Thanks, Nick On Jun 23, 2013, at 11:27 PM, Dan Auerbach <dan@eff.org> wrote: > I > propose the following for either a two state > de-identification regime, or "yellow" if we have three states. > > > > > Normative > text: > > > > Data can be considered de-identified if it has been deleted, > modified, aggregated, anonymized or otherwise manipulated in > order to achieve a reasonable level of justified confidence > that the data cannot reasonably be used to infer information > about, or otherwise be linked to, a particular user, user > agent, or device. > > Non-normative > text: > > > > Example 1. Hashing a pseudonym such as a cookie string does NOT > provide sufficient de-identification for an otherwise rich data > set such as a browsing history, since there are many ways to > re-identify individuals based on pseudonymous data. > > > > Example 2. In many cases, keeping only high-level aggregate > data, such as the total number of visitors of a website each day > broken down by country (discarding data from countries without > many visitors) would be considered sufficiently de-identified. > > > > Example 3. Deleting data is always a safe and easy way to > achieve de-identification. > > > > Remark 1. De-identification is a property of data. If data can > be considered de-identified according to the “reasonable level > of justified confidence” clause of (1), then no data > manipulation process needs to take place in order to satisfy the > requirements of (1). > > > > Remark 2. There are a diversity of techniques being researched > and developed to de-identify data sets, and companies are > encouraged to explore and innovate new approaches to fit their > needs. > > > > Remark 3. It is a best practice for companies to perform > “penetration testing” by having an expert with access to the > data attempt to re-identify individuals or disclose attributes > about them. The expert need not actually identify or disclose > the attribute of an individual, but if the expert demonstrates > how this could plausibly be achieved by joining the data set > against other public data sets or private data sets accessible > to the company, then the data set in question should no longer > be considered sufficiently de-identified and changes should be > made to provide stronger anonymization for the data set.
Received on Monday, 24 June 2013 07:38:24 UTC