Re: anonymous or no?

Thank you for the excellent summary.  You said it much better than I did.

One glimmer of light I see here is, alas, only a glimmer.  Can there be standards of anonymization that take into account the ability to re-identify?  i.e. require that companies releasing data do it so that prevailing practices cannot re-identify?  This, alas, is a moving target (unlike the glimmer of light at the end of a tunnel :-().

Thanks again,

On Aug 13, 2010, at 12:07 , Richard Barnes wrote:

> David,
> In principle, I think you're exactly right that re-identification can
> be a big problem, especially with the rich data sets that many
> organizations are collecting nowadays.  (Our position paper [1]
> touches on this issue briefly, in a slightly different context.)
> As I understand it, however, (and I'm certainly not an expert) the
> challenge for making/implementing policy with regard to
> re-identification is that the mathematics are a little subtle and very
> dependent on they types of data and the underlying population
> distributions.  There's a fairly large body of work on how to do
> anonymization in specific domains (e.g., the techniques applied at the
> Census Bureau [2]), but I'm not aware of a general enough methodology
> to cover the diversity of data collected by entities in the Web.
> (Again, not an expert!)
> The additional challenge given the availability of some public data
> sets is that it's not always possible for the maintainer of a data set
> to know what additional data a recipient might combine with that data
> set.  A demographics provider such as Feeva may only provide
> information to ZIP code granularity, but if a third party analyst also
> knows a user's gender and date of birth, then you're back in the
> classical re-identification regime.
> I'm not sure that all this means that it's completely impossible to
> have any policies about re-identification, but you might have to
> constrain the scope of what you try to achieve.  The fusion problem,
> in particular, seems kind of insurmountable to me.
> --Richard
> [1] <>
> [2] <>
> On Aug 11, 2010 2:50 PM, "David Singer" <> wrote:
> This is a 'discussion point'...I'm not even sure I can express it very
> well, but I think it worth raising.
> Imagine I interact with a web service and my agreement with them is
> that any data that is collected 'about' me is anonymized, so that I am
> not personally identifiable in the database of records they build.
> They respect that agreement, but make the database available for
> analysis etc.
> But now, as we know, people are getting very good at
> re-identification.  Clearly I don't like it if someone says "I'm 95%
> sure that the guy who bought these five books, is that Dave Singer who
> attends the W3C".  I'd like to say "not only must my records be
> anonymized, but re-identification should not occur either".
> But this flies directly in the face of a very long-established
> principle, that the analysis and drawing of conclusions from public
> data is a legitimate, indeed even intended, usage of that public data.
>  And setting that rule would also drive re-identification
> "underground" -- people would still do it, they just wouldn't publish
> the results, which is *worse*.
> The best I can think of is to make sure any policy/rule about
> disclosure/warning applies to personally identifiable data *whether or
> not the identification was original or deductive*, but it doesn't feel
> ideal.  In particular, the party doing the analysis may have no link
> (business relationship etc.) with me at all. How would they disclose
> to me that they have deduced identifiable data? Under what incentive
> would they do that, anyway?
> Thoughts?
> David Singer
> Multimedia and Software Standards, Apple Inc.

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Friday, 13 August 2010 20:22:14 UTC