- From: Rob van Eijk <rob@blaeu.com>
- Date: Fri, 15 Aug 2014 01:04:51 +0200
- To: David Singer <singer@apple.com>
- Cc: "Mike O'Neill" <michael.oneill@baycloud.com>, Justin Brookman <jbrookman@cdt.org>, public-tracking@w3.org
If the definition gets adopted, wouldn't it be fair to the user to include text with a normative MUST for a party to provide detailed information about the details of the de-identification process(es) it applies? Transparency should do it's work to prevent "de-identification by obscurity". Is the group willing to consider such a normative obligation? Rob David Singer schreef op 2014-08-14 22:04: > On Aug 14, 2014, at 12:20 , Mike O'Neill <michael.oneill@baycloud.com> > wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> I agree, using a verb assumes that you already have data about people >> and you apply a de-identifying process to it. It is the process that >> is hard to define, without leaving loopholes. > > Precisely. I am not trying to define a process; I am defining by the > result. The result is a set of data that will never be linked to any > specific user, user-agent, or device, by a suitable combination of > technical (deidentification) measures, and restrictions (e.g. you are > not allowed to try, you are not allowed to distribute this to anyone > unless they agree not to try, and so on). I don’t want to define the > technical processes or the restrictions, just the result; the data > never gets linked to a specific user, user-agent, or device, ever > again. > >> >> What is in scope is tracking data, and DNT should just mean do not >> collect it (unless you claim a permitted use). If you have collected >> in error just delete it. >> >> Maybe that is all we need to say. > > Maybe you need to review where we use the term again. > > 1. Third parties can collect data if > a) they have an exception > b) they have a permitted use > c) it’s deidentified > 2. Unknowing collection. You have to render the data out of scope or > delete it, and out of scope means you have permanently deidentified > it. > 3. Discussed but not yet in the spec.: the ‘raw data’ problem > (companies cannot process raw logs in real time). Keep the raw data > until you can process it, but the raw data has only 3 possible exits > (like third party data): > a) it’s identifiable, but the user allowed you to collect it > b) it’s identifiable, but there is a permitted use you claim and you > adhere to the restrictions of that permitted use > c) it’s not identifiable, it’s been deidentified > > For all these, we need a definition of what data in a deidentified > state means. To me, it means it’s got detached from any given user > (user-agent, or device) and can and/or will never be reattached. > >> >> Mike >> >> >>> -----Original Message----- >>> From: Rob van Eijk [mailto:rob@blaeu.com] >>> Sent: 14 August 2014 19:55 >>> To: David Singer >>> Cc: Justin Brookman; public-tracking@w3.org; Mike O'Neill >>> Subject: Re: Deidentification (ISSUE-188) >>> >>> The core of my issue, which may be a symantic issue, is that the >>> current >>> text is fixed on the word identification. To me it is not clear >>> enough >>> from the current definition that anything else than the 'one way >>> street' >>> is considered re-identification. The definition must be more specific >>> on >>> this point. >>> >>> Does cookie-syncing (which is commonly used in real-time bidding) >>> fall >>> under the meaning of re-identification? >>> >>> Rob >>> >>> David Singer schreef op 2014-08-14 18:37: >>>> Rob, I am sorry, I don’t follow you at all. >>>> >>>> We say in a number of places that data passes out of our scope, and >>>> hence we say nothing at all about it, once it has been deidentified. >>>> We need to define what we mean by that, and we need to define that >>>> ‘exit’ from our scope. >>>> >>>> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote: >>>> >>>>> >>>>> The text you propose connects the state of a permanently >>>>> de-identified >>>>> dataset to the possibility of identifying a user/user-agent or >>>>> device. >>>>> I think limiting the approach to identification is way too limited. >>>>> What is not covered is for example: >>>>> - the sharing (for e.g. data enrichment and data correlation). >>>> >>>> if it doesn’t identify anyone, and won’t/can’t, we have nothing to >>>> say >>>> about sharing it >>>> >>>>> - the application of de-identified data to the individusl user/user >>>>> agent/device (for e.g. re-targeting). >>>> >>>> That’s re-identification, and my text says (a) it ought not be >>>> possible and (b) it ought not be permitted >>>> >>>>> - the retention of data meaning the duration of time that would be >>>>> allowed to bring data in de-identified state. >>>> >>>> That’s a separate question: the ‘raw data’ question (and one of the >>>> exits for raw data is that the data is deidentified) >>>> >>>>> - any (unintended/unforeseen) data uses that may have an impact on >>>>> a >>>>> (the personal space) of a user/user agent/device. For example >>>>> re-targeting based on de-identified data, or re-targeting based on >>>>> correlation with de-identified data. >>>> >>>> I don’t understand how one can target anyone if the data is >>>> deidentified, and if it’s reidentified, then it wasn’t deidentified >>>> to >>>> this definition (the definition insists it is a one-way street). >>>> >>>>> >>>>> My proposal is to exclude text for de-identified data in order to >>>>> aim >>>>> for a cleaner specification. >>>> >>>> Again, I don’t understand. The point of defining it is to say “how >>>> to >>>> get out of the scope of this spec.”. For example, the raw data >>>> clause >>>> I proposed says there are only 3 exits: >>>> * you have permission from the user to retain the data >>>> * you retain the data under a permitted use, in accordance with the >>>> terms of that permitted use >>>> * you deidentify the data so it passes out of our scope >>>> >>>> >>>>> >>>>> Rob >>>>> >>>>> David Singer schreef op 2014-08-14 01:58: >>>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill >>>>>> <michael.oneill@baycloud.com> >>>>>> wrote: >>>>> (...) >>>>>> Trying another way of phrasing it: >>>>>> Data is permanently de-identified (and hence out of the scope of >>>>>> this >>>>>> specification) when a sufficient combination of technical measures >>>>>> and >>>>>> restrictions ensures that the data does not, and cannot and will >>>>>> not >>>>>> be used to, identify a particular user, user-agent, or device. >>>>>> Note: Usage and/or distribution restrictions are strongly >>>>>> recommended >>>>>> for any dataset that has records that relate to a single user or a >>>>>> small number of users; experience has shown that such records can, >>>>>> in >>>>>> fact, sometimes be used to identify the user(s) despite the >>>>>> technical >>>>>> measures that were taken to prevent that happening. >>>>>> David Singer >>>>>> Manager, Software Standards, Apple Inc. >>>> >>>> David Singer >>>> Manager, Software Standards, Apple Inc. >> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.13 (MingW32) >> Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/ >> Charset: utf-8 >> >> iQEcBAEBAgAGBQJT7QwIAAoJEHMxUy4uXm2J7vkIAOUDdIGXlCpvJw9U/KYAbjCN >> I/T2dcIsN3Bd095aNyj+eTiC32sQ96Tc5+q//f9zLx+/CERbIy5/lOhfEQpC6z4z >> gQuJC/Ol691owAGEQFAQEN7sZ4u5nhFFuJzhPnZILBi9tzBj4wLByxskGgf3yMyT >> rlYi50rZpTghA4QOKvszDxAgP/hyRnk2cjWcCCjaiMWVKQh3j7aKUtit4JgU/JKb >> ME50WRt43StzEtcaFfsPGHzwVjG/3z5wqEMWSTnwuyq68OfN8U3g0hmaDhJUzwoU >> P5+tPJOImfOSr0H5eCIXQkKLP6sz8HSrt+HPcNrAO/uKCmIGKlD4AAqSe5Ji0gI= >> =oQfL >> -----END PGP SIGNATURE----- >> > > David Singer > Manager, Software Standards, Apple Inc.
Received on Thursday, 14 August 2014 23:05:45 UTC