W3C home > Mailing lists > Public > public-tracking@w3.org > September 2012

Re: definition of "unlinkable data" in the Compliance spec

From: Rigo Wenning <rigo@w3.org>
Date: Mon, 24 Sep 2012 19:38:20 +0200
To: "Bellovin, Steven M." <sbellovin@ftc.gov>
Cc: "public-tracking@w3.org" <public-tracking@w3.org>, Ed Felten <ed@felten.com>
Message-ID: <1432385.E8IxxKWhG4@hegel.sophia.w3.org>

I come from the "Dossiers" as written down by Westin. There are 
various reasons we do care about those dossiers. Social control is 
one, discrimination is another. The latter rather goes into consumer 
protection today. 

The scenario I think we are addressing here is different from the 
typical de-anonymization scenario. First and foremost because all 
participants in the networked scenario (ecommerce, advertisement and 
the like) have clear information anyway. So if I would be a 
malicious actor, I would claim to follow DNT and not do it instead 
of trying pour money into de-anonymizing data. Just store your data 
and anonymize a copy that you can show.

My second argument is that the typical data DNT is concerned with, 
is marketing data. This kind of profile information is normally 
quickly outdated. And investing much data to get to outdated details 
is not a very promising investment. (I still believe the marketeers 
overestimate the value of all this data)

But your concern is very real and very serious. One could go into 
all those archives and de-anonymize to find every action of a single 
very important person to discredit and influence that person. But 
IMHO, DNT is the wrong tool to prevent any of that. 

I would hope that we carry this very important discussion about how 
anonymous things are and finding a measure for it further. But I 
also would like to re-route us into the PING, the Privacy Interest 

As a last aspect, let me start by reporting that I had already a 
dispute with Mozelle Thompson in 2003 and with half of the European 
DPAs about the danger of just requiring something (here unlinkable 
data) and not describing it with technical precision. This will IMHO 
trigger a situation where it is commercially much more viable to run 
the risk of lacking compliance than to invest into the right 
technology. We need to give reasonable guidance that allows for 
compliance to be predictable. And that's why I think we should 
specify a method to remove the personal context from data rather 
than to define "unlinkable data" and say that everything else is an 
issue. I haven't verified whether the current definition of removing 
the links to persons fits our needs and would encourage everybody to 
evaluate. But I would be against a definition of unlinkable data in 
the Specification. 

I know that Hansen/Pfitzmann was discussing this issue in the IETF 
and that we did some research. There is still lots to do in this 



On Monday 24 September 2012 12:35:00 Bellovin, Steven M. wrote:
> Let me rephrase the question slightly: what is your threat model? 
> Who is trying to obtain what, and what are they willing to spend?
> Allan Schiffman expressed it very nicely some years ago
> (http://marginalguesswork.blogspot.com/2004/07/instant-immortalit
> y.html):  "Amateurs study cryptography; professionals study
> economics."  How much effort do you think various people will put
> into linking -- deanonymizing -- data?    Unsalted hashes are, as
> noted, pretty trivial to invert in many cases of interest here. 
> Salted hashes or encrypted PII?  Who will hold the salt or key? 
> I assume we're not worried about special operations forces making
> midnight raids on data centers -- but how many {dollars, euros,
> yen, zorkmids} is a reidentified record worth?  That translates
> very directly to how many microseconds of compute time it's worth
> to make the effort.  Or -- suppose that you have a file with
> 10,000,000 unlinkable records.  What is it worth to reidentify
> all 10,000,000?  1,000,00 random records? 100,00 random records? 
> One particular one that you think may be of interest for some
> particular reason?  What other resources can the adversary bring 
> to bear?
> Are folks' mental models of the threat that different?
> -----Original Message-----
> From: Rigo Wenning [mailto:rigo@w3.org]
> Sent: Sunday, September 23, 2012 3:32 PM
> To: public-tracking@w3.org
> Cc: Ed Felten
> Subject: Re: definition of "unlinkable data" in the Compliance
> spec
> Ed,
> On Thursday 13 September 2012 17:03:09 Ed Felten wrote:
> > I have several questions about what this means.
> > (A) Why does the definition talk about a process of making data
> > unlinkable, instead of directly defining what it means for data
> > to be unlinkable?  Some data needs to be processed to make it
> > unlinkable, but some data is unlinkable from the start.  The
> > definition should speak to both, even though
> > unlinkable-from-the-start data hasn't gone through any kind of
> > process.  Suppose FirstCorp collects data X; SecondCorp
> > collects
> > X+Y but then runs a process that discards Y to leave it with
> > only
> > X; and ThirdCorp collects X+Y+Z but then minimizes away Y+Z to
> > end up with X.  Shouldn't these three datasets be treated the
> > same--because they are the same X--despite having been through
> > different processes, or no process at all?
> for the data protection people like me, unlinkable data is not
> part of the scope of data protection measures or "privacy" if you
> want. It is therefore rather natural to only talk about linkable
> data in our Specifications meaning data linked to a person. And
> only address what to do with that linkable data and its link to a
> person. This may encompass a definition of what makes data
> "linkable". But it would go too far to define what's
> "unlinkable". Having done research about data being "unlinkable"
> (Slim Trabelsi/SAP has created a nice script to determine the
> entropy allowing for de-anonymization), a definition of
> "unlinkable data" would import that scientific dispute into the
> Specification. I would not really like that to happen as it would
> mean another point of endless debate. You can see this already
> happening in the thread following your message.
Received on Monday, 24 September 2012 17:38:58 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:39:00 UTC