- From: Jeffrey Chester <jeff@democraticmedia.org>
- Date: Fri, 15 Aug 2014 06:18:04 -0400
- To: Rob van Eijk <rob@blaeu.com>
- Cc: David Singer <singer@apple.com>, Mike O'Neill <michael.oneill@baycloud.com>, Justin Brookman <jbrookman@cdt.org>, "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
- Message-Id: <547C74B4-8A67-4A99-907E-8E92E2D9190C@democraticmedia.org>
CDD strongly supports this proposal. This is a key aspect of transparency essential for users. Jeff Jeffrey Chester Center for Digital Democracy 1621 Connecticut Ave, NW, Suite 550 Washington, DC 20009 www.democraticmedia.org www.digitalads.org 202-986-2220 On Aug 14, 2014, at 7:04 PM, Rob van Eijk <rob@blaeu.com> wrote: > > If the definition gets adopted, wouldn't it be fair to the user to include text with a normative MUST for a party to provide detailed information about the details of the de-identification process(es) it applies? Transparency should do it's work to prevent "de-identification by obscurity". > > Is the group willing to consider such a normative obligation? > > Rob > > David Singer schreef op 2014-08-14 22:04: >> On Aug 14, 2014, at 12:20 , Mike O'Neill <michael.oneill@baycloud.com> wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> I agree, using a verb assumes that you already have data about people and you apply a de-identifying process to it. It is the process that is hard to define, without leaving loopholes. >> Precisely. I am not trying to define a process; I am defining by the >> result. The result is a set of data that will never be linked to any >> specific user, user-agent, or device, by a suitable combination of >> technical (deidentification) measures, and restrictions (e.g. you are >> not allowed to try, you are not allowed to distribute this to anyone >> unless they agree not to try, and so on). I don’t want to define the >> technical processes or the restrictions, just the result; the data >> never gets linked to a specific user, user-agent, or device, ever >> again. >>> What is in scope is tracking data, and DNT should just mean do not collect it (unless you claim a permitted use). If you have collected in error just delete it. >>> Maybe that is all we need to say. >> Maybe you need to review where we use the term again. >> 1. Third parties can collect data if >> a) they have an exception >> b) they have a permitted use >> c) it’s deidentified >> 2. Unknowing collection. You have to render the data out of scope or >> delete it, and out of scope means you have permanently deidentified >> it. >> 3. Discussed but not yet in the spec.: the ‘raw data’ problem >> (companies cannot process raw logs in real time). Keep the raw data >> until you can process it, but the raw data has only 3 possible exits >> (like third party data): >> a) it’s identifiable, but the user allowed you to collect it >> b) it’s identifiable, but there is a permitted use you claim and you >> adhere to the restrictions of that permitted use >> c) it’s not identifiable, it’s been deidentified >> For all these, we need a definition of what data in a deidentified >> state means. To me, it means it’s got detached from any given user >> (user-agent, or device) and can and/or will never be reattached. >>> Mike >>>> -----Original Message----- >>>> From: Rob van Eijk [mailto:rob@blaeu.com] >>>> Sent: 14 August 2014 19:55 >>>> To: David Singer >>>> Cc: Justin Brookman; public-tracking@w3.org; Mike O'Neill >>>> Subject: Re: Deidentification (ISSUE-188) >>>> The core of my issue, which may be a symantic issue, is that the current >>>> text is fixed on the word identification. To me it is not clear enough >>>> from the current definition that anything else than the 'one way street' >>>> is considered re-identification. The definition must be more specific on >>>> this point. >>>> Does cookie-syncing (which is commonly used in real-time bidding) fall >>>> under the meaning of re-identification? >>>> Rob >>>> David Singer schreef op 2014-08-14 18:37: >>>>> Rob, I am sorry, I don’t follow you at all. >>>>> We say in a number of places that data passes out of our scope, and >>>>> hence we say nothing at all about it, once it has been deidentified. >>>>> We need to define what we mean by that, and we need to define that >>>>> ‘exit’ from our scope. >>>>> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote: >>>>>> The text you propose connects the state of a permanently de-identified >>>>>> dataset to the possibility of identifying a user/user-agent or device. >>>>>> I think limiting the approach to identification is way too limited. >>>>>> What is not covered is for example: >>>>>> - the sharing (for e.g. data enrichment and data correlation). >>>>> if it doesn’t identify anyone, and won’t/can’t, we have nothing to say >>>>> about sharing it >>>>>> - the application of de-identified data to the individusl user/user >>>>>> agent/device (for e.g. re-targeting). >>>>> That’s re-identification, and my text says (a) it ought not be >>>>> possible and (b) it ought not be permitted >>>>>> - the retention of data meaning the duration of time that would be >>>>>> allowed to bring data in de-identified state. >>>>> That’s a separate question: the ‘raw data’ question (and one of the >>>>> exits for raw data is that the data is deidentified) >>>>>> - any (unintended/unforeseen) data uses that may have an impact on a >>>>>> (the personal space) of a user/user agent/device. For example >>>>>> re-targeting based on de-identified data, or re-targeting based on >>>>>> correlation with de-identified data. >>>>> I don’t understand how one can target anyone if the data is >>>>> deidentified, and if it’s reidentified, then it wasn’t deidentified to >>>>> this definition (the definition insists it is a one-way street). >>>>>> My proposal is to exclude text for de-identified data in order to aim >>>>>> for a cleaner specification. >>>>> Again, I don’t understand. The point of defining it is to say “how to >>>>> get out of the scope of this spec.”. For example, the raw data clause >>>>> I proposed says there are only 3 exits: >>>>> * you have permission from the user to retain the data >>>>> * you retain the data under a permitted use, in accordance with the >>>>> terms of that permitted use >>>>> * you deidentify the data so it passes out of our scope >>>>>> Rob >>>>>> David Singer schreef op 2014-08-14 01:58: >>>>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com> >>>>>>> wrote: >>>>>> (...) >>>>>>> Trying another way of phrasing it: >>>>>>> Data is permanently de-identified (and hence out of the scope of this >>>>>>> specification) when a sufficient combination of technical measures >>>>>>> and >>>>>>> restrictions ensures that the data does not, and cannot and will not >>>>>>> be used to, identify a particular user, user-agent, or device. >>>>>>> Note: Usage and/or distribution restrictions are strongly recommended >>>>>>> for any dataset that has records that relate to a single user or a >>>>>>> small number of users; experience has shown that such records can, in >>>>>>> fact, sometimes be used to identify the user(s) despite the technical >>>>>>> measures that were taken to prevent that happening. >>>>>>> David Singer >>>>>>> Manager, Software Standards, Apple Inc. >>>>> David Singer >>>>> Manager, Software Standards, Apple Inc. >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.13 (MingW32) >>> Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/ >>> Charset: utf-8 >>> iQEcBAEBAgAGBQJT7QwIAAoJEHMxUy4uXm2J7vkIAOUDdIGXlCpvJw9U/KYAbjCN >>> I/T2dcIsN3Bd095aNyj+eTiC32sQ96Tc5+q//f9zLx+/CERbIy5/lOhfEQpC6z4z >>> gQuJC/Ol691owAGEQFAQEN7sZ4u5nhFFuJzhPnZILBi9tzBj4wLByxskGgf3yMyT >>> rlYi50rZpTghA4QOKvszDxAgP/hyRnk2cjWcCCjaiMWVKQh3j7aKUtit4JgU/JKb >>> ME50WRt43StzEtcaFfsPGHzwVjG/3z5wqEMWSTnwuyq68OfN8U3g0hmaDhJUzwoU >>> P5+tPJOImfOSr0H5eCIXQkKLP6sz8HSrt+HPcNrAO/uKCmIGKlD4AAqSe5Ji0gI= >>> =oQfL >>> -----END PGP SIGNATURE----- >> David Singer >> Manager, Software Standards, Apple Inc. > >
Received on Friday, 15 August 2014 10:18:29 UTC