- From: David Singer <singer@apple.com>
- Date: Fri, 13 Aug 2010 13:21:40 -0700
- To: Richard Barnes <richard.barnes@gmail.com>
- Cc: public-privacy@w3.org
Thank you for the excellent summary. You said it much better than I did. One glimmer of light I see here is, alas, only a glimmer. Can there be standards of anonymization that take into account the ability to re-identify? i.e. require that companies releasing data do it so that prevailing practices cannot re-identify? This, alas, is a moving target (unlike the glimmer of light at the end of a tunnel :-(). Thanks again, On Aug 13, 2010, at 12:07 , Richard Barnes wrote: > David, > > In principle, I think you're exactly right that re-identification can > be a big problem, especially with the rich data sets that many > organizations are collecting nowadays. (Our position paper [1] > touches on this issue briefly, in a slightly different context.) > > As I understand it, however, (and I'm certainly not an expert) the > challenge for making/implementing policy with regard to > re-identification is that the mathematics are a little subtle and very > dependent on they types of data and the underlying population > distributions. There's a fairly large body of work on how to do > anonymization in specific domains (e.g., the techniques applied at the > Census Bureau [2]), but I'm not aware of a general enough methodology > to cover the diversity of data collected by entities in the Web. > (Again, not an expert!) > > The additional challenge given the availability of some public data > sets is that it's not always possible for the maintainer of a data set > to know what additional data a recipient might combine with that data > set. A demographics provider such as Feeva may only provide > information to ZIP code granularity, but if a third party analyst also > knows a user's gender and date of birth, then you're back in the > classical re-identification regime. > > I'm not sure that all this means that it's completely impossible to > have any policies about re-identification, but you might have to > constrain the scope of what you try to achieve. The fusion problem, > in particular, seems kind of insurmountable to me. > > --Richard > > > [1] <http://www.w3.org/2010/api-privacy-ws/papers/privacy-ws-35.pdf> > [2] <http://lehd.did.census.gov/led/datatools/onthemap3.html> > > > > On Aug 11, 2010 2:50 PM, "David Singer" <singer@apple.com> wrote: > > This is a 'discussion point'...I'm not even sure I can express it very > well, but I think it worth raising. > > Imagine I interact with a web service and my agreement with them is > that any data that is collected 'about' me is anonymized, so that I am > not personally identifiable in the database of records they build. > They respect that agreement, but make the database available for > analysis etc. > > But now, as we know, people are getting very good at > re-identification. Clearly I don't like it if someone says "I'm 95% > sure that the guy who bought these five books, is that Dave Singer who > attends the W3C". I'd like to say "not only must my records be > anonymized, but re-identification should not occur either". > > But this flies directly in the face of a very long-established > principle, that the analysis and drawing of conclusions from public > data is a legitimate, indeed even intended, usage of that public data. > And setting that rule would also drive re-identification > "underground" -- people would still do it, they just wouldn't publish > the results, which is *worse*. > > The best I can think of is to make sure any policy/rule about > disclosure/warning applies to personally identifiable data *whether or > not the identification was original or deductive*, but it doesn't feel > ideal. In particular, the party doing the analysis may have no link > (business relationship etc.) with me at all. How would they disclose > to me that they have deduced identifiable data? Under what incentive > would they do that, anyway? > > Thoughts? > > David Singer > Multimedia and Software Standards, Apple Inc. David Singer Multimedia and Software Standards, Apple Inc.
Received on Friday, 13 August 2010 20:22:14 UTC