- From: Rob van Eijk <rob@blaeu.com>
- Date: Fri, 15 Mar 2013 22:59:49 +0100
- To: Shane Wiley <wileys@yahoo-inc.com>, Dan Auerbach <dan@eff.org>, "public-tracking@w3.org" <public-tracking@w3.org>
- Message-ID: <4e6f0553-085e-4758-96bd-0185ebe83dde@email.android.com>
Hi Shane, I would love to embrace the outcome that you describe: that information that has been de-identified not later become identified. So maybe it is due to my lack of grasping the true nature of the approach. Please explain how the following scenario applies to de-identified: Use case: booking a hotel room after being retargeted: A user visits a site and looks for info about available hotel rooms in city A for date B. Now browsing the web the user is being confronted by re-targeted personalized ads showing hotel offers in city A at date B. I understand that up to this point data can be scrubbed to be de-identified. But when the user decides to act on the offer, and makes a reservation, a reservation-ID (plus identifyable information) will be tied together with the de-identified data. How does this play under DNT. Please walk me through for I really want to be sure if this may fly. Rob Shane Wiley <wileys@yahoo-inc.com> wrote: >Rob, > >“no wiggle-room” – this is my core concern with some of this direction. >The current definition relies on terms such as “reasonable” (matches up >well with EU concepts of “likely reasonable”). Much like HIPPA, this >gives us a risk-based model to de-identification management. If an >organization states its W3C DNT compliant and articulates their >de-identification process, I believe it’s important to provide >“wiggle-room” for organizations to implement de-identification in a >manner they see appropriate to their particular business model, >technical tools, administrative and operational processes. The >important outcome is that information that has been de-identified not >later become identified. If an organization is willing to make that >public claim and they later prove unable to follow-through on their >commitment, local legal remedies will take over from there. > >As I stated in Berlin, I believe notions of red, yellow, and green are >problematic as they bring a judgmental lens to these states (red = >danger, yellow = caution). I agree with Dan that there should only be >two states: raw and de-identified. > >- Shane > >From: Rob van Eijk [mailto:rob@blaeu.com] >Sent: Friday, March 15, 2013 10:47 AM >To: Dan Auerbach; public-tracking@w3.org; Shane Wiley >Subject: Re: ACTION-371: text defining de-identified data > > >Dan, > >Thanks for the thoughtfull reply. >I understand now that we are on the same page. > >But I doubt that Shane is on that same page as well. If I understand >Shane's position correctly, his view on de-identified does not come >close to the green as I would like it to be. I just want to be >absolutely sure that there is no wiggle-room in what it means to reach >de-identified. > >@Shane: what is your view, taking into account the rely from Dan? > >Rob > > >Dan Auerbach <dan@eff.org<mailto:dan@eff.org>> wrote: > >My view is that we do NOT need to define a third state of data. We have >green and red now. If a compelling argument is made that an orange >state >is needed, we can revisit, but I think that existing permitted uses >plus >having a small time frame for processing raw event data are strong >enough protections to not warrant this third state. Second, regarding >nomenclature, the FTC definition actually defines unlinkability in >terms >of de-identification, so I think it would be very confusing to stray >too >far from that definitional framework. > >A couple further replies inline: > >On 03/14/2013 04:09 AM, Justin Brookman wrote: > >OK, but as I said before, the standard does not currently envision >three states of dat! > > a. As > >written, all data pertaining to a network >communication is in scope, unless it is deidentified,* in which case >it is out of scope. You need to propose a third consequence for a new >class of data for this to have effect. > >* Noting that there is still ongoing discussion about what >"deidentified" actually means, as evidenced by the recent emails from >Ed, Shane, and Dan. > >Justin Brookman >Director, Consumer Privacy >Center for Democracy & Technology >tel 202.407.8812 >justin@cdt.org<mailto:justin@cdt.org> >http://www.cdt.org >@JustinBrookman >@CenDemTech > >On 3/14/2013 5:39 AM, Rob van Eijk wrote: > > >In Boston Shane and I discussed the process of de-identification by >applying it to my mental model (red, orange and green data). Red data >is raw e! > > vent > >level data (eg log files with unique identifiers), >orange is still linkable but de-identified data, green is unlinkable >and therefore anonymous data. > >We agreed that in order to move from red to orange, or from orange to >green, one needs to pass the barriers by processing. As seen in the >de-identrification workshop there are multiple ways to do that. I >illustrated 2 alternative practices: > >1. One example is based on concatenating a random number to the >unique ID. This results in a lookup table of unique ID <-> random >number. >Getting from orange to red is braking the link (un-linkiability) by >throwing away the unique ID. No new red data can be linked to the >un-linkable data in the green. > >I think the trouble with this model is the assumption that the unique >ID >will be the only means of identifying someone. If you'll allow me to >stick with the conceptual framewor! > > k of a > >table for simplicity (think >mysql table or bigtable), I think we should get away from the mentality >that there are "identifiers" -- fields like udids, cookies, IPs, phone >numbers etc. Instead, it is more accurate to say that *every* field of >a >data set provides some bits of identifying information. > >An "orange" data set as you describe might still be super identifying, >if, for example, it is a wide table with lots of fields. As a concrete >example, URLs can be very identifying in some cases, as can timestamps. >Even data that you describe as "green" could still be identifying, if I >understand you correctly. In many instances, having events linked by a >random irreversible identifier (e.g. discarded salt) is simply not >enough to ensure that information can't be reasonably obtained about >users. In some cases it might be, but it depends a lot on that nature >of >the rest of the data in the table. > > >! > > > >2. The other example is based on rotating hashes. Getting from red to >orange is applying the hash. Getting from orange to green is braking >the link (un-linkability) by throwing away the salt. No new red data >can be linked to the un-linkable data in the green. > > > >So I am willing to give up the word unlinkable in the normative >de-identification text, but in exchange non-normative examples should >be added. > >I think it's a good suggestion to say that the non-normative examples >should be fleshed out. But I agree that they should suggest a stronger >version of "green" than I understand from your mental model above >(which >I hope I'm getting right). > > > > > >< > >non-normative text) >De-identification can be accomplished by applying a mental model >(red, orange and green data). Red data is raw event level data (eg >log files with unique identifiers), orange is still linkable but >de-identified data, green is unlinkable and therefore anonymous data. > >In order to move from red to orange, or from orange to green, one >needs to pass the barriers by processing. There are multiple ways to >do that: > >1. One example is based on concatenating a random number to the >unique ID. This results in a lookup table of unique ID <-> random >number. >Getting from orange to red is braking the link (un-linkiability) by >throwing away ! > > the > >unique ID. No new red data can be linked to the >un-linkable data in the green. > >2. Another example is based on rotating hashes. Getting from red to >orange is applying the hash. Getting from orange to green is braking >the link (un-linkability) by throwing away the salt. No new red data >can be linked to the un-linkable data in the green. ></non-normative text) > > >Rob > > >Dan Auerbach schreef op 2013-03-13 19:01: > >I also agree that we should just stick with de-identified, just as a >point of nomenclature. For one, unlike what you propose below, Rob, >the FTC text actually defines unlinkability in terms of >de-identification, so I think it would be very confusing if we did the >opposite here. > >That said, we did NOT agree at the face-to-face that unlinkability >was a ! > > "step > >beyond de-identified"; we are not at all weakening the >standard with our word choice. For unlinkability and de-identification >both, we do NOT propose a holy grail of provably perfect anonymization >that can't be achieved in practice (or even in theory, really!). >However, for both we require a significantly higher standard than, for >example, keeping a pseudonymous data set of browsing history. The >first non-normative example is intended to make this clear, but I can >flesh it out if it's not. > >Dan > >On 03/13/2013 10:28 AM, Shane Wiley wrote: > >Ed, > >Agreed - reasonably attempting to clear unique identifiers or >information that could lead to unique identification in URLs should >also be included. > >- Shane > >FROM: Edward W. Felten [mailto:felten@CS.Princeton.EDU]SENT: > >Wednesday, March 13, 2013 10:22 AM >TO: Justin Brookman >CC: <public-tracking@w3.org<mailto:public-tracking@w3.org>> >SUBJECT: Re: ACTION-371: text defining de-identified data > >But we should be equally clear that "de-identify" means more than >just removing the most obvious identifiers from the data. > >On Wed, Mar 13, 2013 at 1:07 PM, Justin Brookman ><justin@cdt.org<mailto:justin@cdt.org>> >wrote: > >Shane is right that we did choose to use "deidentified" instead of >"unlinkable" at the Cambridge meeting. So I agree we probably >should not use "unlinkable" to define "deidentified" in the >standard. However, I don't see why we need to define "unlinkable" >at all, as it has no operational meaning, and was rejected because >it implied a technological impossibility of relinking, which is not >a standard that can be reasonably achieved. > >Justin Brookman >Director, Consumer Privacy >Center for Democracy & Technology >tel 202.4! > > 07.8812 > >[1] >justin@cdt.org<mailto:justin@cdt.org> >http://www.cdt.org [2] >@JustinBrookman >@CenDemTech > >On 3/13/2013 11:35 AM, Shane Wiley wrote: > >Rob, > >So we're agreed unlinkability requires more processing than >de-identified - good. I would recommend we define de-identified >(nearly done) and unlinkability separately to clearly demonstrate >they are different points within a continuum. We can then focus on >the discussion of retention of data in its de-identified state >prior to moving to the ultimate unlinkable state. > >- Shane > >-----Original Message----- >From: Rob van Eijk [mailto:rob@blaeu.com] >Sent: Wednesday, March 13, 2013 8:28 AM >To: Shane Wiley >Cc: public-tracking@w3.org<mailto:public-tracking@w3.org> >Subject: RE: ACTION-371: text defining de-identified data > >Hi Shane, > >I hear you and understand your position. But unlinkable and >de-identified are not mutual > >exclusive. Unlinkable data is a subset >of de-identified data, they just go through another step of >scrubbing). >Adding it to the list is not hurting your position. > >The key towards the middle ground remains data retention, which has >to be proportionate to the purpose. > >Rob > >Shane Wiley schreef op 2013-03-13 16:13: > >Rob, > >I thought we had agreed to not mix the "unlinkable" term with >"de-identified" here. In our discussions in Boston it appeared there >was general agreement that unlinkability in a step beyond >de-identified. Once a record has been rendered de-identified, it can >later further be made unlinkable (using your definition of unlinkable >vs. the one I proposed). This is a significant sticking point for >those of use attempting to find middle-ground here so hopefully we can >document the details in non-normative text but I'd ask that we remove >mention of unlinkable ! > > in the > >definition of de-identified at this time >(or else we've not really moved forward in this discussion in my >opinion). > >- Shane > >-----Original Message----- >From: Rob van Eijk [mailto:rob@blaeu.com] >Sent: Wednesday, March 13, 2013 5:57 AM >To: public-tracking@w3.org<mailto:public-tracking@w3.org> >Subject: RE: ACTION-371: text defining de-identified data > >Dan, Kevin, > >I would really want the unlinkability in there as well. I propose to >add the text: made unlinkable > >Normative text: Data can be considered sufficiently de-identified to >the extent that it has been deleted, made unlinkable, modified, >aggregated, anonymized or otherwise manipulated in order to achieve a >reasonable level of justified confidence that the data cannot >reasonably be used to infer information about, or otherwise be linked >to, a particular user, user agent, computer or device. > >In terms of privacy by design, de-identifica! > > tion > >through unlinkability >is the strongest form of de-identtification IMHO. > >Rob > >Kevin Kiley schreef op 2013-03-12 19:03: > >Dan, > >In case I wasn't being clear in my last post, I (personally) believe >that > >User-agent should *NOT* be removed from the proposed text. > >I actually don't think it would do any harm to *ADD* the word >'Computer' > >as well ( which is present in the current FTC definition ) so it >reads like this… > >Normative text: > >Data can be considered sufficiently de-identified to the extent that >it > >has been deleted, modified, aggregated, anonymized or otherwise > >manipulated in order to achieve a reasonable level of justified > >confidence that the data cannot reasonably be used to infer >information > >about, or otherwise be linked to, a particular user, user agent, >computer or device. > >I think that co! > > vers it > >pretty well, and *NO* 'clarifying text' is >necessary. > >Just my 2 cents. > >Kevin Kiley > >Previous message(s)… > >Dan, > >Perhaps you can add text clarifying this perspective or, much like >the FTC, suffice with "device" which I believe more than covers what >you're looking for here. > >- Shane > >From: Dan Auerbach [mailto:dan@eff.org] > >Sent: Tuesday, March 12, 2013 8:57 AM > >To: public-tracking@w3.org<mailto:public-tracking@w3.org> > >Subject: Re: ACTION-371: text defining de-identified data > >Shane and Kevin -- The phrase "user agent" in the text is intended to >refer to a particular user agent (not "Chrome 26" but rather "the >browser running on Dan's laptop". I hoped that would be clear from >context, but if it's not we can clarify. I may not be able to >identify your device per se, but can identify that this is the same >browser as I saw before. I think this is the case wi! > > th using > >cookies, >for example. It seems more accurate to me than lumping it all under >"device", and appropriate since the text of our document is elsewhere >focused on user agents, unlike the FTC text. > >Best, > >Dan > >On 03/12/2013 12:19 AM, Kevin Kiley wrote: > >Shane Wiley wrote... >I had removed "user agent" in the suggested edit as this could be >something as generic as "Chrome 26". > >It can also be something VERY specific... and tell you a LOT about >the Computer/OS/Device being used. > >In the case of Mobile... it will pretty much tell you EXACTLY what >'Device' is being used. > >The FTC likewise does not use "user agent" in their definition. >That's true... but BOTH definitions (W3C and FTC) currently mention >'Device'... and the FTC > >reports go to great lengths about how important it is to exclude any >knowledge of 'the Device' > >from the de-identified data ( especially in the case of 'Mobile >Devices' ). > >Kevin Kiley > >-- >Edward W. Felten >Professor of Computer Science and Public Affairs >Director, Center for Information Technology Policy >Princeton University >609-258-5906 http://www.cs.princeton.edu/~felten [3] > >-- >Dan Auerbach >Staff Technologist >Electronic Frontier Foundation >dan@eff.org<mailto:dan@eff.org> >415 436 9333 x134 > > >Links: >------ >[1] tel:202.407.8812 >[2] http://www.cdt.org >[3] http://www.cs.princeton.edu/%7Efelten > > > > > > > >-- >Dan Auerbach >Staff Technologist >Electronic Frontier Foundation >dan@eff.org<mailto:dan@eff.org> >415 436 9333 x134
Received on Friday, 15 March 2013 22:00:29 UTC