- From: Mike O'Neill <michael.oneill@baycloud.com>
- Date: Fri, 15 Aug 2014 17:07:07 +0100
- To: "'David Singer'" <singer@apple.com>
- Cc: <rob@blaeu.com>, "'Justin Brookman'" <jbrookman@cdt.org>, <public-tracking@w3.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 As I said, I do not think the old definition of de-identified works for the third-party compliance section (or any statement describing data as out-of-scope of DNT). It assumes that identifying (tracking) data has been collected and some process other than deletion can be applied to it to make it safe. I suggested we use a new definition for out-of-scope e.g. anonymous data (mathematically impossible to derive identity from it, or being linked to an individual in a subsequent network interaction), and leaving the definition of the de-identifying process for the permitted use section (data collected unknowingly in error should just be deleted). I agree your "data does not, and cannot and will not " implies impossibility, and the dreaded "reasonable" has gone which is good. Though the non-normative bit counteracts that somewhat by calling for distribution restrictions (which are not needed if the data "cannot" be re-identified). I agree with Rob that a new definition would probably be superfluous given our definition of tracking implying in-scope data as : ".. data regarding a particular user's activity across multiple distinct contexts". The problem I have is that with the other-contexts qualification machine discoverability becomes tricky. This could create a loophole if collected data with a UID is out-of-scope when the controller promises to wear tunnel-vision glasses. Does anyone have ideas how to address that? Mike > -----Original Message----- > From: David Singer [mailto:singer@apple.com] > Sent: 14 August 2014 21:05 > To: Mike O'Neill > Cc: rob@blaeu.com; Justin Brookman; public-tracking@w3.org > Subject: Re: Deidentification (ISSUE-188) > > > On Aug 14, 2014, at 12:20 , Mike O'Neill <michael.oneill@baycloud.com> wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > I agree, using a verb assumes that you already have data about people and you > apply a de-identifying process to it. It is the process that is hard to define, > without leaving loopholes. > > Precisely. I am not trying to define a process; I am defining by the result. The > result is a set of data that will never be linked to any specific user, user-agent, or > device, by a suitable combination of technical (deidentification) measures, and > restrictions (e.g. you are not allowed to try, you are not allowed to distribute > this to anyone unless they agree not to try, and so on). I don’t want to define > the technical processes or the restrictions, just the result; the data never gets > linked to a specific user, user-agent, or device, ever again. > > > > > What is in scope is tracking data, and DNT should just mean do not collect it > (unless you claim a permitted use). If you have collected in error just delete it. > > > > Maybe that is all we need to say. > > Maybe you need to review where we use the term again. > > 1. Third parties can collect data if > a) they have an exception > b) they have a permitted use > c) it’s deidentified > 2. Unknowing collection. You have to render the data out of scope or delete it, > and out of scope means you have permanently deidentified it. > 3. Discussed but not yet in the spec.: the ‘raw data’ problem (companies cannot > process raw logs in real time). Keep the raw data until you can process it, but the > raw data has only 3 possible exits (like third party data): > a) it’s identifiable, but the user allowed you to collect it > b) it’s identifiable, but there is a permitted use you claim and you adhere to the > restrictions of that permitted use > c) it’s not identifiable, it’s been deidentified > > For all these, we need a definition of what data in a deidentified state means. > To me, it means it’s got detached from any given user (user-agent, or device) > and can and/or will never be reattached. > > > > > Mike > > > > > >> -----Original Message----- > >> From: Rob van Eijk [mailto:rob@blaeu.com] > >> Sent: 14 August 2014 19:55 > >> To: David Singer > >> Cc: Justin Brookman; public-tracking@w3.org; Mike O'Neill > >> Subject: Re: Deidentification (ISSUE-188) > >> > >> The core of my issue, which may be a symantic issue, is that the current > >> text is fixed on the word identification. To me it is not clear enough > >> from the current definition that anything else than the 'one way street' > >> is considered re-identification. The definition must be more specific on > >> this point. > >> > >> Does cookie-syncing (which is commonly used in real-time bidding) fall > >> under the meaning of re-identification? > >> > >> Rob > >> > >> David Singer schreef op 2014-08-14 18:37: > >>> Rob, I am sorry, I don’t follow you at all. > >>> > >>> We say in a number of places that data passes out of our scope, and > >>> hence we say nothing at all about it, once it has been deidentified. > >>> We need to define what we mean by that, and we need to define that > >>> ‘exit’ from our scope. > >>> > >>> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote: > >>> > >>>> > >>>> The text you propose connects the state of a permanently de-identified > >>>> dataset to the possibility of identifying a user/user-agent or device. > >>>> I think limiting the approach to identification is way too limited. > >>>> What is not covered is for example: > >>>> - the sharing (for e.g. data enrichment and data correlation). > >>> > >>> if it doesn’t identify anyone, and won’t/can’t, we have nothing to say > >>> about sharing it > >>> > >>>> - the application of de-identified data to the individusl user/user > >>>> agent/device (for e.g. re-targeting). > >>> > >>> That’s re-identification, and my text says (a) it ought not be > >>> possible and (b) it ought not be permitted > >>> > >>>> - the retention of data meaning the duration of time that would be > >>>> allowed to bring data in de-identified state. > >>> > >>> That’s a separate question: the ‘raw data’ question (and one of the > >>> exits for raw data is that the data is deidentified) > >>> > >>>> - any (unintended/unforeseen) data uses that may have an impact on a > >>>> (the personal space) of a user/user agent/device. For example > >>>> re-targeting based on de-identified data, or re-targeting based on > >>>> correlation with de-identified data. > >>> > >>> I don’t understand how one can target anyone if the data is > >>> deidentified, and if it’s reidentified, then it wasn’t deidentified to > >>> this definition (the definition insists it is a one-way street). > >>> > >>>> > >>>> My proposal is to exclude text for de-identified data in order to aim > >>>> for a cleaner specification. > >>> > >>> Again, I don’t understand. The point of defining it is to say “how to > >>> get out of the scope of this spec.”. For example, the raw data clause > >>> I proposed says there are only 3 exits: > >>> * you have permission from the user to retain the data > >>> * you retain the data under a permitted use, in accordance with the > >>> terms of that permitted use > >>> * you deidentify the data so it passes out of our scope > >>> > >>> > >>>> > >>>> Rob > >>>> > >>>> David Singer schreef op 2014-08-14 01:58: > >>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com> > >>>>> wrote: > >>>> (...) > >>>>> Trying another way of phrasing it: > >>>>> Data is permanently de-identified (and hence out of the scope of this > >>>>> specification) when a sufficient combination of technical measures > >>>>> and > >>>>> restrictions ensures that the data does not, and cannot and will not > >>>>> be used to, identify a particular user, user-agent, or device. > >>>>> Note: Usage and/or distribution restrictions are strongly recommended > >>>>> for any dataset that has records that relate to a single user or a > >>>>> small number of users; experience has shown that such records can, in > >>>>> fact, sometimes be used to identify the user(s) despite the technical > >>>>> measures that were taken to prevent that happening. > >>>>> David Singer > >>>>> Manager, Software Standards, Apple Inc. > >>> > >>> David Singer > >>> Manager, Software Standards, Apple Inc. > > > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.13 (MingW32) > > Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/ > > Charset: utf-8 > > > > > iQEcBAEBAgAGBQJT7QwIAAoJEHMxUy4uXm2J7vkIAOUDdIGXlCpvJw9U/KYAbjC > N > > I/T2dcIsN3Bd095aNyj+eTiC32sQ96Tc5+q//f9zLx+/CERbIy5/lOhfEQpC6z4z > > gQuJC/Ol691owAGEQFAQEN7sZ4u5nhFFuJzhPnZILBi9tzBj4wLByxskGgf3yMyT > > rlYi50rZpTghA4QOKvszDxAgP/hyRnk2cjWcCCjaiMWVKQh3j7aKUtit4JgU/JKb > > > ME50WRt43StzEtcaFfsPGHzwVjG/3z5wqEMWSTnwuyq68OfN8U3g0hmaDhJUzw > oU > > P5+tPJOImfOSr0H5eCIXQkKLP6sz8HSrt+HPcNrAO/uKCmIGKlD4AAqSe5Ji0gI= > > =oQfL > > -----END PGP SIGNATURE----- > > > > David Singer > Manager, Software Standards, Apple Inc. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (MingW32) Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/ Charset: utf-8 iQEcBAEBAgAGBQJT7jAqAAoJEHMxUy4uXm2JQ5YIAMdK4v8iwBq0X1eMeWpFumT9 F3Ny8JvTJNke4xGnNGA6nsLjkokbEh+7xJeiqkJtOKwWC1xZcQS1zLw6tb97REx4 L0oUu8hQ5BYH7F3kZVPJ05u7tRhQfhy4k0fe9jG/glK/+ymUb5i1naVkO7NZG30j PmvO01/u8tQdpUW1q7fkfwLojcka5XdZb/4QFd8Fb5rez22SpqERQiEMv+tw4na/ S2GVdUX9E5ByngiYuMLr7psn2T50FC/QR+KkfKPaKjXz/PQBR+YP3zHk+MhGGi/u 4FViOatRJhXAd5rydtvAbju9+LNad2fJet/iMiV15wLLYvIyyYcIa7rX/XOnUJM= =vxfq -----END PGP SIGNATURE-----
Received on Friday, 15 August 2014 16:07:40 UTC