- From: Lee Tien <tien@eff.org>
- Date: Fri, 15 Aug 2014 09:25:09 -0700
- To: Jeffrey Chester <jeff@democraticmedia.org>
- Cc: Rob van Eijk <rob@blaeu.com>, David Singer <singer@apple.com>, Mike O'Neill <michael.oneill@baycloud.com>, Justin Brookman <jbrookman@cdt.org>, "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
- Message-Id: <3B020081-5C04-4611-BAD0-C8BCCAD58450@eff.org>
EFF agrees: transparency in de-identification methods is very important and is far superior for users than the old-school "expert certification without showing your work" approach. Lee Sent from my iPhone On Aug 15, 2014, at 3:18 AM, Jeffrey Chester <jeff@democraticmedia.org> wrote: > CDD strongly supports this proposal. This is a key aspect of transparency essential for users. > > Jeff > > > Jeffrey Chester > Center for Digital Democracy > 1621 Connecticut Ave, NW, Suite 550 > Washington, DC 20009 > www.democraticmedia.org > www.digitalads.org > 202-986-2220 > > On Aug 14, 2014, at 7:04 PM, Rob van Eijk <rob@blaeu.com> wrote: > >> >> If the definition gets adopted, wouldn't it be fair to the user to include text with a normative MUST for a party to provide detailed information about the details of the de-identification process(es) it applies? Transparency should do it's work to prevent "de-identification by obscurity". >> >> Is the group willing to consider such a normative obligation? >> >> Rob >> >> David Singer schreef op 2014-08-14 22:04: >>> On Aug 14, 2014, at 12:20 , Mike O'Neill <michael.oneill@baycloud.com> wrote: >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> I agree, using a verb assumes that you already have data about people and you apply a de-identifying process to it. It is the process that is hard to define, without leaving loopholes. >>> Precisely. I am not trying to define a process; I am defining by the >>> result. The result is a set of data that will never be linked to any >>> specific user, user-agent, or device, by a suitable combination of >>> technical (deidentification) measures, and restrictions (e.g. you are >>> not allowed to try, you are not allowed to distribute this to anyone >>> unless they agree not to try, and so on). I don’t want to define the >>> technical processes or the restrictions, just the result; the data >>> never gets linked to a specific user, user-agent, or device, ever >>> again. >>>> What is in scope is tracking data, and DNT should just mean do not collect it (unless you claim a permitted use). If you have collected in error just delete it. >>>> Maybe that is all we need to say. >>> Maybe you need to review where we use the term again. >>> 1. Third parties can collect data if >>> a) they have an exception >>> b) they have a permitted use >>> c) it’s deidentified >>> 2. Unknowing collection. You have to render the data out of scope or >>> delete it, and out of scope means you have permanently deidentified >>> it. >>> 3. Discussed but not yet in the spec.: the ‘raw data’ problem >>> (companies cannot process raw logs in real time). Keep the raw data >>> until you can process it, but the raw data has only 3 possible exits >>> (like third party data): >>> a) it’s identifiable, but the user allowed you to collect it >>> b) it’s identifiable, but there is a permitted use you claim and you >>> adhere to the restrictions of that permitted use >>> c) it’s not identifiable, it’s been deidentified >>> For all these, we need a definition of what data in a deidentified >>> state means. To me, it means it’s got detached from any given user >>> (user-agent, or device) and can and/or will never be reattached. >>>> Mike >>>>> -----Original Message----- >>>>> From: Rob van Eijk [mailto:rob@blaeu.com] >>>>> Sent: 14 August 2014 19:55 >>>>> To: David Singer >>>>> Cc: Justin Brookman; public-tracking@w3.org; Mike O'Neill >>>>> Subject: Re: Deidentification (ISSUE-188) >>>>> The core of my issue, which may be a symantic issue, is that the current >>>>> text is fixed on the word identification. To me it is not clear enough >>>>> from the current definition that anything else than the 'one way street' >>>>> is considered re-identification. The definition must be more specific on >>>>> this point. >>>>> Does cookie-syncing (which is commonly used in real-time bidding) fall >>>>> under the meaning of re-identification? >>>>> Rob >>>>> David Singer schreef op 2014-08-14 18:37: >>>>>> Rob, I am sorry, I don’t follow you at all. >>>>>> We say in a number of places that data passes out of our scope, and >>>>>> hence we say nothing at all about it, once it has been deidentified. >>>>>> We need to define what we mean by that, and we need to define that >>>>>> ‘exit’ from our scope. >>>>>> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote: >>>>>>> The text you propose connects the state of a permanently de-identified >>>>>>> dataset to the possibility of identifying a user/user-agent or device. >>>>>>> I think limiting the approach to identification is way too limited. >>>>>>> What is not covered is for example: >>>>>>> - the sharing (for e.g. data enrichment and data correlation). >>>>>> if it doesn’t identify anyone, and won’t/can’t, we have nothing to say >>>>>> about sharing it >>>>>>> - the application of de-identified data to the individusl user/user >>>>>>> agent/device (for e.g. re-targeting). >>>>>> That’s re-identification, and my text says (a) it ought not be >>>>>> possible and (b) it ought not be permitted >>>>>>> - the retention of data meaning the duration of time that would be >>>>>>> allowed to bring data in de-identified state. >>>>>> That’s a separate question: the ‘raw data’ question (and one of the >>>>>> exits for raw data is that the data is deidentified) >>>>>>> - any (unintended/unforeseen) data uses that may have an impact on a >>>>>>> (the personal space) of a user/user agent/device. For example >>>>>>> re-targeting based on de-identified data, or re-targeting based on >>>>>>> correlation with de-identified data. >>>>>> I don’t understand how one can target anyone if the data is >>>>>> deidentified, and if it’s reidentified, then it wasn’t deidentified to >>>>>> this definition (the definition insists it is a one-way street). >>>>>>> My proposal is to exclude text for de-identified data in order to aim >>>>>>> for a cleaner specification. >>>>>> Again, I don’t understand. The point of defining it is to say “how to >>>>>> get out of the scope of this spec.”. For example, the raw data clause >>>>>> I proposed says there are only 3 exits: >>>>>> * you have permission from the user to retain the data >>>>>> * you retain the data under a permitted use, in accordance with the >>>>>> terms of that permitted use >>>>>> * you deidentify the data so it passes out of our scope >>>>>>> Rob >>>>>>> David Singer schreef op 2014-08-14 01:58: >>>>>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com> >>>>>>>> wrote: >>>>>>> (...) >>>>>>>> Trying another way of phrasing it: >>>>>>>> Data is permanently de-identified (and hence out of the scope of this >>>>>>>> specification) when a sufficient combination of technical measures >>>>>>>> and >>>>>>>> restrictions ensures that the data does not, and cannot and will not >>>>>>>> be used to, identify a particular user, user-agent, or device. >>>>>>>> Note: Usage and/or distribution restrictions are strongly recommended >>>>>>>> for any dataset that has records that relate to a single user or a >>>>>>>> small number of users; experience has shown that such records can, in >>>>>>>> fact, sometimes be used to identify the user(s) despite the technical >>>>>>>> measures that were taken to prevent that happening. >>>>>>>> David Singer >>>>>>>> Manager, Software Standards, Apple Inc. >>>>>> David Singer >>>>>> Manager, Software Standards, Apple Inc. >>>> -----BEGIN PGP SIGNATURE----- >>>> Version: GnuPG v1.4.13 (MingW32) >>>> Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/ >>>> Charset: utf-8 >>>> iQEcBAEBAgAGBQJT7QwIAAoJEHMxUy4uXm2J7vkIAOUDdIGXlCpvJw9U/KYAbjCN >>>> I/T2dcIsN3Bd095aNyj+eTiC32sQ96Tc5+q//f9zLx+/CERbIy5/lOhfEQpC6z4z >>>> gQuJC/Ol691owAGEQFAQEN7sZ4u5nhFFuJzhPnZILBi9tzBj4wLByxskGgf3yMyT >>>> rlYi50rZpTghA4QOKvszDxAgP/hyRnk2cjWcCCjaiMWVKQh3j7aKUtit4JgU/JKb >>>> ME50WRt43StzEtcaFfsPGHzwVjG/3z5wqEMWSTnwuyq68OfN8U3g0hmaDhJUzwoU >>>> P5+tPJOImfOSr0H5eCIXQkKLP6sz8HSrt+HPcNrAO/uKCmIGKlD4AAqSe5Ji0gI= >>>> =oQfL >>>> -----END PGP SIGNATURE----- >>> David Singer >>> Manager, Software Standards, Apple Inc. >> >> >
Received on Friday, 15 August 2014 16:25:54 UTC