RE: definition of "unlinkable data" in the Compliance spec from Shane Wiley on 2012-09-21 (public-tracking@w3.org from September 2012)

From: Shane Wiley <wileys@yahoo-inc.com>
Date: Fri, 21 Sep 2012 15:32:23 -0700
To: "rob@blaeu.com" <rob@blaeu.com>, "public-tracking@w3.org" <public-tracking@w3.org>
Message-ID: <63294A1959410048A33AEE161379C802620761D88B@SP2-EX07VS02.ds.corp.yahoo.com>
Rob,

"commonly understood" and "lively debated" are inconsistent end states.  :-)

- Shane

-----Original Message-----
From: Rob van Eijk [mailto:rob@blaeu.com] 
Sent: Friday, September 21, 2012 3:13 PM
To: public-tracking@w3.org
Subject: RE: definition of "unlinkable data" in the Compliance spec


Lauren, Shane,
There is a commonly understood definition of unlinkable data. In fact the definitions have been and still are lively debated in the IETF.  I recommend to look at https://tools.ietf.org/id/draft-hansen-privacy-terminology-03.html


Rob

Shane Wiley schreef op 2012-09-21 20:05:
> Lauren,
>
> I disagree with your opening as I don't believe there is a "commonly 
> understood" definition of unlinkable data. My approach is a 
> COMBINATION or both technical and process controls - not only one or 
> the other.
>
> - Shane
>
> FROM: Lauren Gelman [mailto:gelman@blurryedge.com]
> SENT: Friday, September 21, 2012 10:34 AM
> TO: Ed Felten
> CC: Shane Wiley; Grimmelmann, James; <public-tracking@w3.org>
> SUBJECT: Re: definition of "unlinkable data" in the Compliance spec
>
> I agree that what Shane describes does not meet a commonly understood 
> definition of unlinkable data. Unlinkable data means that the data 
> controller is prevented by technology from linking the data to a 
> common unique identifier. Anonymous data also means the same thing. I 
> think what Shane is describing I would call "Silo'd data". Data that 
> could be re-identified but is stored separately, and where POLICY 
> prevents its re-identification, not TECHNOLOGY. (Although I am running 
> into use cases where "stored separately" is not really accurate if you 
> are using the cloud, eg. AWS, but that is another issue).
>
> I thought about commercially reasonable. I'm not sure I like it 
> because what is commercially reasonable for a bigger company might be 
> different for a start-up. This is practical and pro-innovation but not 
> a perspective that is necessarily good for privacy so I am torn. [In 
> Ed's example, while "Hash+" may be commercially reasonable for 
> established competiors it may not be for a start-up. But we may want 
> to incent "HashOnly" anyway, because it does prevent the specific 
> threat of third parties who obtain the database finding it useful.]
>
> But we could go with this:
>
> _Unlinkable data is data that cannot be associated with an 
> identifiable person or user agent using commercially reasonable 
> means._
>
> We could have examples in the compliance doc on what is commercially 
> reasonable, but my bet is that the FTC will define it sometime in the 
> next year.
>
> [I am actually not sure what adding user agent does here. It was in 
> the original text so I added it but i have never used it in any 
> policies I have written so I leave that analysis open.]
>
> Lauren Gelman
> BlurryEdge Strategies
> 415-627-8512
>
> On Sep 21, 2012, at 10:00 AM, Ed Felten wrote:
>
> By the way, hashing IP addresses (with or without salting) does not 
> render them unlinkable. After hashing, it's easy to recovery the 
> original IP address. The story is similar for other types of unique 
> identifiers--there are ways to get to unlinkability, but hashing by 
> itself won't be enough.
>
> On Fri, Sep 21, 2012 at 12:01 PM, Shane Wiley <wileys@yahoo-inc.com 
> [1]> wrote:
> <Ed - apologies for not getting back to you sooner - I was on vacation 
> for the past week.>
>
> James,
>
> I like your approach the best and it was this perspective I was 
> intending when writing the text that Ed is questioning.
>
> The goal is to find the middle-ground between complete destruction of 
> data and an unlinkable state that still allows for longitudinal 
> consistency for analytical purposes BUT CANNOT be linked back to a 
> production system such that the data could be used to modify a single 
> user's experience.
>
> For example, performing a one-way secret hash (salted hash) on 
> identifiers (Cookie IDs, IP Addresses) and storing the resulting 
> dataset in a logically/physically separate location from production 
> data with strict access controls, policies, and employee education 
> would meet the definition of "unlinkable" I'm aiming for.
>
> - Shane
>
> -----Original Message-----
> From: Grimmelmann, James [mailto:James.Grimmelmann@nyls.edu [2]]
> Sent: Friday, September 21, 2012 8:14 AM
> To: Lauren Gelman
> Cc: Ed Felten; <public-tracking@w3.org [3]>
> Subject: Re: definition of "unlinkable data" in the Compliance spec
>
> I really like Lauren's suggestion. My only concern is that 
> "reasonably" and "reasonable" have so many different meanings in legal 
> settings that it could be ambiguous. Sometimes an action is 
> "reasonable" if a person who is ethical and cautious would do it:
> it's
> not reasonable to leave sharp tools lying around in a children's play 
> area, or to invest a trust fund in marshmallows. Sometimes it refers 
> to what a rational non-expert would believe about the subject, so a 
> court will uphold a jury verdict unless "no reasonable jury" could 
> have reached the conclusion it did. Sometimes it's about the norms and 
> expectations of an industry. An auction might need to be conducted in 
> a "commercially reasonable" way, which means for example giving enough 
> notice that there will be real competitive bidding, but not spending 
> more than the property is worth.
>
> I think this last sense is the most appropriate one in context. So 
> perhaps something like "data that cannot be associated with an 
> identifiable person or user agent through commercially reasonable 
> means." That is, the question would be whether a normal business with 
> normal resources and motivations would consider reidentifying the data 
> to be feasible.
>
> James
>
> --------------------------------------------------
> James Grimmelmann Professor of Law
> New York Law School (212) 431-2864
> 185 West Broadway
> james.grimmelmann@nyls.edu<mailto:james.grimmelmann@nyls.edu [4]> New 
> York, NY 10013 http://james.grimmelmann.net [5]
>
> On Sep 20, 2012, at 7:22 PM, Lauren Gelman 
> <gelman@blurryedge.com<mailto:gelman@blurryedge.com [6]>> wrote:
>
> Unlinkable data is data that cannot reasonably be associated with an 
> identifiable person or user agent.
>
> Lauren Gelman
> BlurryEdge Strategies
> 415-627-8512
>
> On Sep 18, 2012, at 8:05 AM, Ed Felten wrote:
>
> Sorry to repost this, but nobody has answered any of my questions 
> about Option 1 for the unlinkability definition.
>
> Note to proponents of Option 1 (if any): If nobody can explain or 
> clarify Option 1, that will presumably be used as an argument against 
> Option 1 when decision time comes.
>
> ---------- Forwarded message ----------
> From: Ed Felten <ed@felten.com<mailto:ed@felten.com [7]>>
> Date: Thu, Sep 13, 2012 at 5:03 PM
> Subject: definition of "unlinkable data" in the Compliance spec
> To: "<public-tracking@w3.org<mailto:public-tracking@w3.org [8]>>"
> <public-tracking@w3.org<mailto:public-tracking@w3.org [9]>>
>
> I have some questions about the Option 1 definition of "Unlinkable 
> Data", section 3.6.1 in the Compliance spec editor's draft. The 
> definition is as follows [fixing typos]:
>
> A party renders a dataset unlinkable when it:
> 1. takes commercially reasonable steps to de-identify data such that 
> there is confidence that it contains information which could not be 
> linked to a specific user, user agent, or device in a production 
> environment [2. and 3. aren't relevant to my questions]
>
> I have several questions about what this means.
> (A) Why does the definition talk about a process of making data 
> unlinkable, instead of directly defining what it means for data to be 
> unlinkable? Some data needs to be processed to make it unlinkable, but 
> some data is unlinkable from the start. The definition should speak to 
> both, even though unlinkable-from-the-start data hasn't gone through 
> any kind of process. Suppose FirstCorp collects data X; SecondCorp 
> collects X+Y but then runs a process that discards Y to leave it with 
> only X; and ThirdCorp collects X+Y+Z but then minimizes away Y+Z to 
> end up with X. Shouldn't these three datasets be treated the 
> same--because they are the same X--despite having been through 
> different processes, or no process at all?
> (B) Why "commercially reasonable" rather than just "reasonable"? The 
> term "reasonable" already takes into account all relevant factors.
> Can
> somebody give an example of something that would qualify as 
> "commercially reasonable" but not "reasonable", or vice versa? If not, 
> "commercially" only makes the definition harder to understand.
> (C) "there is confidence" seems to raise two questions. First, who is 
> it that needs to be confident? Second, can the confidence be just an 
> unsupported gut feeling of optimism, or does there need to be some 
> valid reason for confidence? Presumably the intent is that the party 
> holding the data has justified confidence that the data cannot be 
> linked, but if so it might be better to spell that out.
> (D) Why "it contains information which could not be linked" rather 
> than the simpler "it could not be linked"? Do the extra words add any 
> meaning?
> (E) What does "in a production environment" add? If the goal is to 
> rule out results demonstrated in a research environment, I doubt this 
> language would accomplish that goal, because all of the 
> re-identification research I know of required less than a production 
> environment. If the goal is to rule out linking approaches that aren't 
> at all practical, some other language would probably be better.
>
> (I don't have questions about the meaning of Option 2; which shouldn't 
> be interpreted as a preference for or against Option 2.)
>
>
>
> Links:
> ------
> [1] mailto:wileys@yahoo-inc.com
> [2] mailto:James.Grimmelmann@nyls.edu
> [3] mailto:public-tracking@w3.org
> [4] 
> mailto:james.grimmelmann@nyls.edu%3cmailto:james.grimmelmann@nyls.edu
> [5] http://james.grimmelmann.net

> [6] mailto:gelman@blurryedge.com%3cmailto:gelman@blurryedge.com
> [7] mailto:ed@felten.com%3cmailto:ed@felten.com
> [8] mailto:public-tracking@w3.org%3cmailto:public-tracking@w3.org
> [9] mailto:public-tracking@w3.org%3cmailto:public-tracking@w3.org
Received on Friday, 21 September 2012 22:33:17 UTC