- From: Rob van Eijk <rob@blaeu.com>
- Date: Sat, 22 Sep 2012 00:12:53 +0200
- To: <public-tracking@w3.org>
Lauren, Shane, There is a commonly understood definition of unlinkable data. In fact the definitions have been and still are lively debated in the IETF. I recommend to look at https://tools.ietf.org/id/draft-hansen-privacy-terminology-03.html Rob Shane Wiley schreef op 2012-09-21 20:05: > Lauren, > > I disagree with your opening as I don't believe there is a "commonly > understood" definition of unlinkable data. My approach is a > COMBINATION or both technical and process controls - not only one or > the other. > > - Shane > > FROM: Lauren Gelman [mailto:gelman@blurryedge.com] > SENT: Friday, September 21, 2012 10:34 AM > TO: Ed Felten > CC: Shane Wiley; Grimmelmann, James; <public-tracking@w3.org> > SUBJECT: Re: definition of "unlinkable data" in the Compliance spec > > I agree that what Shane describes does not meet a commonly understood > definition of unlinkable data. Unlinkable data means that the data > controller is prevented by technology from linking the data to a > common unique identifier. Anonymous data also means the same thing. I > think what Shane is describing I would call "Silo'd data". Data that > could be re-identified but is stored separately, and where POLICY > prevents its re-identification, not TECHNOLOGY. (Although I am > running > into use cases where "stored separately" is not really accurate if > you > are using the cloud, eg. AWS, but that is another issue). > > I thought about commercially reasonable. I'm not sure I like it > because what is commercially reasonable for a bigger company might be > different for a start-up. This is practical and pro-innovation but > not > a perspective that is necessarily good for privacy so I am torn. [In > Ed's example, while "Hash+" may be commercially reasonable for > established competiors it may not be for a start-up. But we may want > to incent "HashOnly" anyway, because it does prevent the specific > threat of third parties who obtain the database finding it useful.] > > But we could go with this: > > _Unlinkable data is data that cannot be associated with an > identifiable person or user agent using commercially reasonable > means._ > > We could have examples in the compliance doc on what is commercially > reasonable, but my bet is that the FTC will define it sometime in the > next year. > > [I am actually not sure what adding user agent does here. It was in > the original text so I added it but i have never used it in any > policies I have written so I leave that analysis open.] > > Lauren Gelman > BlurryEdge Strategies > 415-627-8512 > > On Sep 21, 2012, at 10:00 AM, Ed Felten wrote: > > By the way, hashing IP addresses (with or without salting) does not > render them unlinkable. After hashing, it's easy to recovery the > original IP address. The story is similar for other types of unique > identifiers--there are ways to get to unlinkability, but hashing by > itself won't be enough. > > On Fri, Sep 21, 2012 at 12:01 PM, Shane Wiley <wileys@yahoo-inc.com > [1]> wrote: > <Ed - apologies for not getting back to you sooner - I was on > vacation > for the past week.> > > James, > > I like your approach the best and it was this perspective I was > intending when writing the text that Ed is questioning. > > The goal is to find the middle-ground between complete destruction of > data and an unlinkable state that still allows for longitudinal > consistency for analytical purposes BUT CANNOT be linked back to a > production system such that the data could be used to modify a single > user's experience. > > For example, performing a one-way secret hash (salted hash) on > identifiers (Cookie IDs, IP Addresses) and storing the resulting > dataset in a logically/physically separate location from production > data with strict access controls, policies, and employee education > would meet the definition of "unlinkable" I'm aiming for. > > - Shane > > -----Original Message----- > From: Grimmelmann, James [mailto:James.Grimmelmann@nyls.edu [2]] > Sent: Friday, September 21, 2012 8:14 AM > To: Lauren Gelman > Cc: Ed Felten; <public-tracking@w3.org [3]> > Subject: Re: definition of "unlinkable data" in the Compliance spec > > I really like Lauren's suggestion. My only concern is that > "reasonably" and "reasonable" have so many different meanings in > legal > settings that it could be ambiguous. Sometimes an action is > "reasonable" if a person who is ethical and cautious would do it: > it's > not reasonable to leave sharp tools lying around in a children's play > area, or to invest a trust fund in marshmallows. Sometimes it refers > to what a rational non-expert would believe about the subject, so a > court will uphold a jury verdict unless "no reasonable jury" could > have reached the conclusion it did. Sometimes it's about the norms > and > expectations of an industry. An auction might need to be conducted in > a "commercially reasonable" way, which means for example giving > enough > notice that there will be real competitive bidding, but not spending > more than the property is worth. > > I think this last sense is the most appropriate one in context. So > perhaps something like "data that cannot be associated with an > identifiable person or user agent through commercially reasonable > means." That is, the question would be whether a normal business with > normal resources and motivations would consider reidentifying the > data > to be feasible. > > James > > -------------------------------------------------- > James Grimmelmann Professor of Law > New York Law School (212) 431-2864 > 185 West Broadway > james.grimmelmann@nyls.edu<mailto:james.grimmelmann@nyls.edu [4]> > New York, NY 10013 http://james.grimmelmann.net [5] > > On Sep 20, 2012, at 7:22 PM, Lauren Gelman > <gelman@blurryedge.com<mailto:gelman@blurryedge.com [6]>> wrote: > > Unlinkable data is data that cannot reasonably be associated with an > identifiable person or user agent. > > Lauren Gelman > BlurryEdge Strategies > 415-627-8512 > > On Sep 18, 2012, at 8:05 AM, Ed Felten wrote: > > Sorry to repost this, but nobody has answered any of my questions > about Option 1 for the unlinkability definition. > > Note to proponents of Option 1 (if any): If nobody can explain or > clarify Option 1, that will presumably be used as an argument against > Option 1 when decision time comes. > > ---------- Forwarded message ---------- > From: Ed Felten <ed@felten.com<mailto:ed@felten.com [7]>> > Date: Thu, Sep 13, 2012 at 5:03 PM > Subject: definition of "unlinkable data" in the Compliance spec > To: "<public-tracking@w3.org<mailto:public-tracking@w3.org [8]>>" > <public-tracking@w3.org<mailto:public-tracking@w3.org [9]>> > > I have some questions about the Option 1 definition of "Unlinkable > Data", section 3.6.1 in the Compliance spec editor's draft. The > definition is as follows [fixing typos]: > > A party renders a dataset unlinkable when it: > 1. takes commercially reasonable steps to de-identify data such that > there is confidence that it contains information which could not be > linked to a specific user, user agent, or device in a production > environment [2. and 3. aren't relevant to my questions] > > I have several questions about what this means. > (A) Why does the definition talk about a process of making data > unlinkable, instead of directly defining what it means for data to be > unlinkable? Some data needs to be processed to make it unlinkable, > but > some data is unlinkable from the start. The definition should speak > to > both, even though unlinkable-from-the-start data hasn't gone through > any kind of process. Suppose FirstCorp collects data X; SecondCorp > collects X+Y but then runs a process that discards Y to leave it with > only X; and ThirdCorp collects X+Y+Z but then minimizes away Y+Z to > end up with X. Shouldn't these three datasets be treated the > same--because they are the same X--despite having been through > different processes, or no process at all? > (B) Why "commercially reasonable" rather than just "reasonable"? The > term "reasonable" already takes into account all relevant factors. > Can > somebody give an example of something that would qualify as > "commercially reasonable" but not "reasonable", or vice versa? If > not, > "commercially" only makes the definition harder to understand. > (C) "there is confidence" seems to raise two questions. First, who is > it that needs to be confident? Second, can the confidence be just an > unsupported gut feeling of optimism, or does there need to be some > valid reason for confidence? Presumably the intent is that the party > holding the data has justified confidence that the data cannot be > linked, but if so it might be better to spell that out. > (D) Why "it contains information which could not be linked" rather > than the simpler "it could not be linked"? Do the extra words add any > meaning? > (E) What does "in a production environment" add? If the goal is to > rule out results demonstrated in a research environment, I doubt this > language would accomplish that goal, because all of the > re-identification research I know of required less than a production > environment. If the goal is to rule out linking approaches that > aren't > at all practical, some other language would probably be better. > > (I don't have questions about the meaning of Option 2; which > shouldn't > be interpreted as a preference for or against Option 2.) > > > > Links: > ------ > [1] mailto:wileys@yahoo-inc.com > [2] mailto:James.Grimmelmann@nyls.edu > [3] mailto:public-tracking@w3.org > [4] > mailto:james.grimmelmann@nyls.edu%3cmailto:james.grimmelmann@nyls.edu > [5] http://james.grimmelmann.net > [6] mailto:gelman@blurryedge.com%3cmailto:gelman@blurryedge.com > [7] mailto:ed@felten.com%3cmailto:ed@felten.com > [8] mailto:public-tracking@w3.org%3cmailto:public-tracking@w3.org > [9] mailto:public-tracking@w3.org%3cmailto:public-tracking@w3.org
Received on Friday, 21 September 2012 22:13:28 UTC