Re: Deidentification (ISSUE-188)

CDD strongly supports this proposal.  This is a key aspect of transparency essential for users.


Jeffrey Chester
Center for Digital Democracy
1621 Connecticut Ave, NW, Suite 550
Washington, DC 20009

On Aug 14, 2014, at 7:04 PM, Rob van Eijk <> wrote:

> If the definition gets adopted, wouldn't it be fair to the user to include text with a normative MUST for a party to provide detailed information about the details of the de-identification process(es) it applies? Transparency should do it's work to prevent "de-identification by obscurity".
> Is the group willing to consider such a normative obligation?
> Rob
> David Singer schreef op 2014-08-14 22:04:
>> On Aug 14, 2014, at 12:20 , Mike O'Neill <> wrote:
>>> Hash: SHA1
>>> I agree, using a verb assumes that you already have data about people and you apply a de-identifying process to it. It is the process that is hard to define, without leaving loopholes.
>> Precisely.  I am not trying to define a process; I am defining by the
>> result.  The result is a set of data that will never be linked to any
>> specific user, user-agent, or device, by a suitable combination of
>> technical (deidentification) measures, and restrictions (e.g. you are
>> not allowed to try, you are not allowed to distribute this to anyone
>> unless they agree not to try, and so on).  I donít want to define the
>> technical processes or the restrictions, just the result;  the data
>> never gets linked to a specific user, user-agent, or device, ever
>> again.
>>> What is in scope is tracking data, and DNT should just mean do not collect it (unless you claim a permitted use). If you have collected in error just delete it.
>>> Maybe that is all we need to say.
>> Maybe you need to review where we use the term again.
>> 1.  Third parties can collect data if
>>  a) they have an exception
>>  b) they have a permitted use
>>  c) itís deidentified
>> 2.  Unknowing collection.  You have to render the data out of scope or
>> delete it, and out of scope means you have permanently deidentified
>> it.
>> 3. Discussed but not yet in the spec.: the Ďraw dataí problem
>> (companies cannot process raw logs in real time). Keep the raw data
>> until you can process it, but the raw data has only 3 possible exits
>> (like third party data):
>>  a) itís identifiable, but the user allowed you to collect it
>>  b) itís identifiable, but there is a permitted use you claim and you
>> adhere to the restrictions of that permitted use
>>  c) itís not identifiable, itís been deidentified
>> For all these, we need a definition of what data in a deidentified
>> state means.  To me, it means itís got detached from any given user
>> (user-agent, or device) and can and/or will never be reattached.
>>> Mike
>>>> -----Original Message-----
>>>> From: Rob van Eijk []
>>>> Sent: 14 August 2014 19:55
>>>> To: David Singer
>>>> Cc: Justin Brookman;; Mike O'Neill
>>>> Subject: Re: Deidentification (ISSUE-188)
>>>> The core of my issue, which may be a symantic issue, is that the current
>>>> text is fixed on the word identification. To me it is not clear enough
>>>> from the current definition that anything else than the 'one way street'
>>>> is considered re-identification. The definition must be more specific on
>>>> this point.
>>>> Does cookie-syncing (which is commonly used in real-time bidding) fall
>>>> under the meaning of re-identification?
>>>> Rob
>>>> David Singer schreef op 2014-08-14 18:37:
>>>>> Rob, I am sorry, I donít follow you at all.
>>>>> We say in a number of places that data passes out of our scope, and
>>>>> hence we say nothing at all about it, once it has been deidentified.
>>>>> We need to define what we mean by that, and we need to define that
>>>>> Ďexití from our scope.
>>>>> On Aug 14, 2014, at 2:08 , Rob van Eijk <> wrote:
>>>>>> The text you propose connects the state of a permanently de-identified
>>>>>> dataset to the possibility of identifying a user/user-agent or device.
>>>>>> I think limiting the approach to identification is way too limited.
>>>>>> What is not covered is for example:
>>>>>> - the sharing (for e.g. data enrichment and data correlation).
>>>>> if it doesnít identify anyone, and wonít/canít, we have nothing to say
>>>>> about sharing it
>>>>>> - the application of de-identified data to the individusl user/user
>>>>>> agent/device (for e.g. re-targeting).
>>>>> Thatís re-identification, and my text says (a) it ought not be
>>>>> possible and (b) it ought not be permitted
>>>>>> - the retention of data meaning the duration of time that would be
>>>>>> allowed to bring data in de-identified state.
>>>>> Thatís a separate question: the Ďraw dataí question (and one of the
>>>>> exits for raw data is that the data is deidentified)
>>>>>> - any (unintended/unforeseen) data uses that may have an impact on a
>>>>>> (the personal space) of a user/user agent/device. For example
>>>>>> re-targeting based on de-identified data, or re-targeting based on
>>>>>> correlation with de-identified data.
>>>>> I donít understand how one can target anyone if the data is
>>>>> deidentified, and if itís reidentified, then it wasnít deidentified to
>>>>> this definition (the definition insists it is a one-way street).
>>>>>> My proposal is to exclude text for de-identified data in order to aim
>>>>>> for a cleaner specification.
>>>>> Again, I donít understand.  The point of defining it is to say ďhow to
>>>>> get out of the scope of this spec.Ē.  For example, the raw data clause
>>>>> I proposed says there are only 3 exits:
>>>>> * you have permission from the user to retain the data
>>>>> * you retain the data under a permitted use, in accordance with the
>>>>> terms of that permitted use
>>>>> * you deidentify the data so it passes out of our scope
>>>>>> Rob
>>>>>> David Singer schreef op 2014-08-14 01:58:
>>>>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <>
>>>>>>> wrote:
>>>>>> (...)
>>>>>>> Trying another way of phrasing it:
>>>>>>> Data is permanently de-identified (and hence out of the scope of this
>>>>>>> specification) when a sufficient combination of technical measures
>>>>>>> and
>>>>>>> restrictions ensures that the data does not, and cannot and will not
>>>>>>> be used to, identify a particular user, user-agent, or device.
>>>>>>> Note: Usage and/or distribution restrictions are strongly recommended
>>>>>>> for any dataset that has records that relate to a single user or a
>>>>>>> small number of users; experience has shown that such records can, in
>>>>>>> fact, sometimes be used to identify the user(s) despite the technical
>>>>>>> measures that were taken to prevent that happening.
>>>>>>> David Singer
>>>>>>> Manager, Software Standards, Apple Inc.
>>>>> David Singer
>>>>> Manager, Software Standards, Apple Inc.
>>> Version: GnuPG v1.4.13 (MingW32)
>>> Comment: Using gpg4o v3.3.26.5094 -
>>> Charset: utf-8
>>> I/T2dcIsN3Bd095aNyj+eTiC32sQ96Tc5+q//f9zLx+/CERbIy5/lOhfEQpC6z4z
>>> gQuJC/Ol691owAGEQFAQEN7sZ4u5nhFFuJzhPnZILBi9tzBj4wLByxskGgf3yMyT
>>> rlYi50rZpTghA4QOKvszDxAgP/hyRnk2cjWcCCjaiMWVKQh3j7aKUtit4JgU/JKb
>>> ME50WRt43StzEtcaFfsPGHzwVjG/3z5wqEMWSTnwuyq68OfN8U3g0hmaDhJUzwoU
>>> P5+tPJOImfOSr0H5eCIXQkKLP6sz8HSrt+HPcNrAO/uKCmIGKlD4AAqSe5Ji0gI=
>>> =oQfL
>>> -----END PGP SIGNATURE-----
>> David Singer
>> Manager, Software Standards, Apple Inc.

Received on Friday, 15 August 2014 10:18:29 UTC