Re: anonymous or no? from David Singer on 2010-08-13 (public-privacy@w3.org from July to September 2010)

From: David Singer <singer@apple.com>
Date: Fri, 13 Aug 2010 13:21:40 -0700
To: Richard Barnes <richard.barnes@gmail.com>
Cc: public-privacy@w3.org
Message-Id: <8962DA1F-6CD4-4336-A4EB-C40528201B1F@apple.com>
Thank you for the excellent summary.  You said it much better than I did.

One glimmer of light I see here is, alas, only a glimmer.  Can there be standards of anonymization that take into account the ability to re-identify?  i.e. require that companies releasing data do it so that prevailing practices cannot re-identify?  This, alas, is a moving target (unlike the glimmer of light at the end of a tunnel :-().

Thanks again,

On Aug 13, 2010, at 12:07 , Richard Barnes wrote:

> David,
> 
> In principle, I think you're exactly right that re-identification can
> be a big problem, especially with the rich data sets that many
> organizations are collecting nowadays.  (Our position paper [1]
> touches on this issue briefly, in a slightly different context.)
> 
> As I understand it, however, (and I'm certainly not an expert) the
> challenge for making/implementing policy with regard to
> re-identification is that the mathematics are a little subtle and very
> dependent on they types of data and the underlying population
> distributions.  There's a fairly large body of work on how to do
> anonymization in specific domains (e.g., the techniques applied at the
> Census Bureau [2]), but I'm not aware of a general enough methodology
> to cover the diversity of data collected by entities in the Web.
> (Again, not an expert!)
> 
> The additional challenge given the availability of some public data
> sets is that it's not always possible for the maintainer of a data set
> to know what additional data a recipient might combine with that data
> set.  A demographics provider such as Feeva may only provide
> information to ZIP code granularity, but if a third party analyst also
> knows a user's gender and date of birth, then you're back in the
> classical re-identification regime.
> 
> I'm not sure that all this means that it's completely impossible to
> have any policies about re-identification, but you might have to
> constrain the scope of what you try to achieve.  The fusion problem,
> in particular, seems kind of insurmountable to me.
> 
> --Richard
> 
> 
> [1] <http://www.w3.org/2010/api-privacy-ws/papers/privacy-ws-35.pdf>
> [2] <http://lehd.did.census.gov/led/datatools/onthemap3.html>
> 
> 
> 
> On Aug 11, 2010 2:50 PM, "David Singer" <singer@apple.com> wrote:
> 
> This is a 'discussion point'...I'm not even sure I can express it very
> well, but I think it worth raising.
> 
> Imagine I interact with a web service and my agreement with them is
> that any data that is collected 'about' me is anonymized, so that I am
> not personally identifiable in the database of records they build.
> They respect that agreement, but make the database available for
> analysis etc.
> 
> But now, as we know, people are getting very good at
> re-identification.  Clearly I don't like it if someone says "I'm 95%
> sure that the guy who bought these five books, is that Dave Singer who
> attends the W3C".  I'd like to say "not only must my records be
> anonymized, but re-identification should not occur either".
> 
> But this flies directly in the face of a very long-established
> principle, that the analysis and drawing of conclusions from public
> data is a legitimate, indeed even intended, usage of that public data.
>  And setting that rule would also drive re-identification
> "underground" -- people would still do it, they just wouldn't publish
> the results, which is *worse*.
> 
> The best I can think of is to make sure any policy/rule about
> disclosure/warning applies to personally identifiable data *whether or
> not the identification was original or deductive*, but it doesn't feel
> ideal.  In particular, the party doing the analysis may have no link
> (business relationship etc.) with me at all. How would they disclose
> to me that they have deduced identifiable data? Under what incentive
> would they do that, anyway?
> 
> Thoughts?
> 
> David Singer
> Multimedia and Software Standards, Apple Inc.

David Singer
Multimedia and Software Standards, Apple Inc.
Received on Friday, 13 August 2010 20:22:14 UTC