anonymous or no?

This is a 'discussion point'...I'm not even sure I can express it very well, but I think it worth raising.

Imagine I interact with a web service and my agreement with them is that any data that is collected 'about' me is anonymized, so that I am not personally identifiable in the database of records they build.  They respect that agreement, but make the database available for analysis etc.

But now, as we know, people are getting very good at re-identification.  Clearly I don't like it if someone says "I'm 95% sure that the guy who bought these five books, is that Dave Singer who attends the W3C".  I'd like to say "not only must my records be anonymized, but re-identification should not occur either".

But this flies directly in the face of a very long-established principle, that the analysis and drawing of conclusions from public data is a legitimate, indeed even intended, usage of that public data.  And setting that rule would also drive re-identification "underground" -- people would still do it, they just wouldn't publish the results, which is *worse*.

The best I can think of is to make sure any policy/rule about disclosure/warning applies to personally identifiable data *whether or not the identification was original or deductive*, but it doesn't feel ideal.  In particular, the party doing the analysis may have no link (business relationship etc.) with me at all. How would they disclose to me that they have deduced identifiable data? Under what incentive would they do that, anyway?

Thoughts?

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Wednesday, 11 August 2010 18:50:14 UTC