- From: Rigo Wenning <rigo@w3.org>
- Date: Mon, 24 Sep 2012 19:38:20 +0200
- To: "Bellovin, Steven M." <sbellovin@ftc.gov>
- Cc: "public-tracking@w3.org" <public-tracking@w3.org>, Ed Felten <ed@felten.com>
Steven, I come from the "Dossiers" as written down by Westin. There are various reasons we do care about those dossiers. Social control is one, discrimination is another. The latter rather goes into consumer protection today. The scenario I think we are addressing here is different from the typical de-anonymization scenario. First and foremost because all participants in the networked scenario (ecommerce, advertisement and the like) have clear information anyway. So if I would be a malicious actor, I would claim to follow DNT and not do it instead of trying pour money into de-anonymizing data. Just store your data and anonymize a copy that you can show. My second argument is that the typical data DNT is concerned with, is marketing data. This kind of profile information is normally quickly outdated. And investing much data to get to outdated details is not a very promising investment. (I still believe the marketeers overestimate the value of all this data) But your concern is very real and very serious. One could go into all those archives and de-anonymize to find every action of a single very important person to discredit and influence that person. But IMHO, DNT is the wrong tool to prevent any of that. I would hope that we carry this very important discussion about how anonymous things are and finding a measure for it further. But I also would like to re-route us into the PING, the Privacy Interest Group. As a last aspect, let me start by reporting that I had already a dispute with Mozelle Thompson in 2003 and with half of the European DPAs about the danger of just requiring something (here unlinkable data) and not describing it with technical precision. This will IMHO trigger a situation where it is commercially much more viable to run the risk of lacking compliance than to invest into the right technology. We need to give reasonable guidance that allows for compliance to be predictable. And that's why I think we should specify a method to remove the personal context from data rather than to define "unlinkable data" and say that everything else is an issue. I haven't verified whether the current definition of removing the links to persons fits our needs and would encourage everybody to evaluate. But I would be against a definition of unlinkable data in the Specification. I know that Hansen/Pfitzmann was discussing this issue in the IETF and that we did some research. There is still lots to do in this area. -- Rigo On Monday 24 September 2012 12:35:00 Bellovin, Steven M. wrote: > Let me rephrase the question slightly: what is your threat model? > Who is trying to obtain what, and what are they willing to spend? > > Allan Schiffman expressed it very nicely some years ago > (http://marginalguesswork.blogspot.com/2004/07/instant-immortalit > y.html): "Amateurs study cryptography; professionals study > economics." How much effort do you think various people will put > into linking -- deanonymizing -- data? Unsalted hashes are, as > noted, pretty trivial to invert in many cases of interest here. > Salted hashes or encrypted PII? Who will hold the salt or key? > I assume we're not worried about special operations forces making > midnight raids on data centers -- but how many {dollars, euros, > yen, zorkmids} is a reidentified record worth? That translates > very directly to how many microseconds of compute time it's worth > to make the effort. Or -- suppose that you have a file with > 10,000,000 unlinkable records. What is it worth to reidentify > all 10,000,000? 1,000,00 random records? 100,00 random records? > One particular one that you think may be of interest for some > particular reason? What other resources can the adversary bring > to bear? > > Are folks' mental models of the threat that different? > > -----Original Message----- > From: Rigo Wenning [mailto:rigo@w3.org] > Sent: Sunday, September 23, 2012 3:32 PM > To: public-tracking@w3.org > Cc: Ed Felten > Subject: Re: definition of "unlinkable data" in the Compliance > spec > > Ed, > > On Thursday 13 September 2012 17:03:09 Ed Felten wrote: > > I have several questions about what this means. > > (A) Why does the definition talk about a process of making data > > unlinkable, instead of directly defining what it means for data > > to be unlinkable? Some data needs to be processed to make it > > unlinkable, but some data is unlinkable from the start. The > > definition should speak to both, even though > > unlinkable-from-the-start data hasn't gone through any kind of > > process. Suppose FirstCorp collects data X; SecondCorp > > collects > > X+Y but then runs a process that discards Y to leave it with > > only > > X; and ThirdCorp collects X+Y+Z but then minimizes away Y+Z to > > end up with X. Shouldn't these three datasets be treated the > > same--because they are the same X--despite having been through > > different processes, or no process at all? > > for the data protection people like me, unlinkable data is not > part of the scope of data protection measures or "privacy" if you > want. It is therefore rather natural to only talk about linkable > data in our Specifications meaning data linked to a person. And > only address what to do with that linkable data and its link to a > person. This may encompass a definition of what makes data > "linkable". But it would go too far to define what's > "unlinkable". Having done research about data being "unlinkable" > (Slim Trabelsi/SAP has created a nice script to determine the > entropy allowing for de-anonymization), a definition of > "unlinkable data" would import that scientific dispute into the > Specification. I would not really like that to happen as it would > mean another point of endless debate. You can see this already > happening in the thread following your message.
Received on Monday, 24 September 2012 17:38:58 UTC