Re: Deidentification (ISSUE-188) from Jeffrey Chester on 2014-08-15 (public-tracking@w3.org from August 2014)

From: Jeffrey Chester <jeff@democraticmedia.org>
Date: Fri, 15 Aug 2014 06:18:04 -0400
To: Rob van Eijk <rob@blaeu.com>
Cc: David Singer <singer@apple.com>, Mike O'Neill <michael.oneill@baycloud.com>, Justin Brookman <jbrookman@cdt.org>, "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-Id: <547C74B4-8A67-4A99-907E-8E92E2D9190C@democraticmedia.org>
CDD strongly supports this proposal.  This is a key aspect of transparency essential for users.

Jeff


Jeffrey Chester
Center for Digital Democracy
1621 Connecticut Ave, NW, Suite 550
Washington, DC 20009
www.democraticmedia.org
www.digitalads.org
202-986-2220

On Aug 14, 2014, at 7:04 PM, Rob van Eijk <rob@blaeu.com> wrote:

> 
> If the definition gets adopted, wouldn't it be fair to the user to include text with a normative MUST for a party to provide detailed information about the details of the de-identification process(es) it applies? Transparency should do it's work to prevent "de-identification by obscurity".
> 
> Is the group willing to consider such a normative obligation?
> 
> Rob
> 
> David Singer schreef op 2014-08-14 22:04:
>> On Aug 14, 2014, at 12:20 , Mike O'Neill <michael.oneill@baycloud.com> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>> I agree, using a verb assumes that you already have data about people and you apply a de-identifying process to it. It is the process that is hard to define, without leaving loopholes.
>> Precisely.  I am not trying to define a process; I am defining by the
>> result.  The result is a set of data that will never be linked to any
>> specific user, user-agent, or device, by a suitable combination of
>> technical (deidentification) measures, and restrictions (e.g. you are
>> not allowed to try, you are not allowed to distribute this to anyone
>> unless they agree not to try, and so on).  I don’t want to define the
>> technical processes or the restrictions, just the result;  the data
>> never gets linked to a specific user, user-agent, or device, ever
>> again.
>>> What is in scope is tracking data, and DNT should just mean do not collect it (unless you claim a permitted use). If you have collected in error just delete it.
>>> Maybe that is all we need to say.
>> Maybe you need to review where we use the term again.
>> 1.  Third parties can collect data if
>>  a) they have an exception
>>  b) they have a permitted use
>>  c) it’s deidentified
>> 2.  Unknowing collection.  You have to render the data out of scope or
>> delete it, and out of scope means you have permanently deidentified
>> it.
>> 3. Discussed but not yet in the spec.: the ‘raw data’ problem
>> (companies cannot process raw logs in real time). Keep the raw data
>> until you can process it, but the raw data has only 3 possible exits
>> (like third party data):
>>  a) it’s identifiable, but the user allowed you to collect it
>>  b) it’s identifiable, but there is a permitted use you claim and you
>> adhere to the restrictions of that permitted use
>>  c) it’s not identifiable, it’s been deidentified
>> For all these, we need a definition of what data in a deidentified
>> state means.  To me, it means it’s got detached from any given user
>> (user-agent, or device) and can and/or will never be reattached.
>>> Mike
>>>> -----Original Message-----
>>>> From: Rob van Eijk [mailto:rob@blaeu.com]
>>>> Sent: 14 August 2014 19:55
>>>> To: David Singer
>>>> Cc: Justin Brookman; public-tracking@w3.org; Mike O'Neill
>>>> Subject: Re: Deidentification (ISSUE-188)
>>>> The core of my issue, which may be a symantic issue, is that the current
>>>> text is fixed on the word identification. To me it is not clear enough
>>>> from the current definition that anything else than the 'one way street'
>>>> is considered re-identification. The definition must be more specific on
>>>> this point.
>>>> Does cookie-syncing (which is commonly used in real-time bidding) fall
>>>> under the meaning of re-identification?
>>>> Rob
>>>> David Singer schreef op 2014-08-14 18:37:
>>>>> Rob, I am sorry, I don’t follow you at all.
>>>>> We say in a number of places that data passes out of our scope, and
>>>>> hence we say nothing at all about it, once it has been deidentified.
>>>>> We need to define what we mean by that, and we need to define that
>>>>> ‘exit’ from our scope.
>>>>> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote:
>>>>>> The text you propose connects the state of a permanently de-identified
>>>>>> dataset to the possibility of identifying a user/user-agent or device.
>>>>>> I think limiting the approach to identification is way too limited.
>>>>>> What is not covered is for example:
>>>>>> - the sharing (for e.g. data enrichment and data correlation).
>>>>> if it doesn’t identify anyone, and won’t/can’t, we have nothing to say
>>>>> about sharing it
>>>>>> - the application of de-identified data to the individusl user/user
>>>>>> agent/device (for e.g. re-targeting).
>>>>> That’s re-identification, and my text says (a) it ought not be
>>>>> possible and (b) it ought not be permitted
>>>>>> - the retention of data meaning the duration of time that would be
>>>>>> allowed to bring data in de-identified state.
>>>>> That’s a separate question: the ‘raw data’ question (and one of the
>>>>> exits for raw data is that the data is deidentified)
>>>>>> - any (unintended/unforeseen) data uses that may have an impact on a
>>>>>> (the personal space) of a user/user agent/device. For example
>>>>>> re-targeting based on de-identified data, or re-targeting based on
>>>>>> correlation with de-identified data.
>>>>> I don’t understand how one can target anyone if the data is
>>>>> deidentified, and if it’s reidentified, then it wasn’t deidentified to
>>>>> this definition (the definition insists it is a one-way street).
>>>>>> My proposal is to exclude text for de-identified data in order to aim
>>>>>> for a cleaner specification.
>>>>> Again, I don’t understand.  The point of defining it is to say “how to
>>>>> get out of the scope of this spec.”.  For example, the raw data clause
>>>>> I proposed says there are only 3 exits:
>>>>> * you have permission from the user to retain the data
>>>>> * you retain the data under a permitted use, in accordance with the
>>>>> terms of that permitted use
>>>>> * you deidentify the data so it passes out of our scope
>>>>>> Rob
>>>>>> David Singer schreef op 2014-08-14 01:58:
>>>>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com>
>>>>>>> wrote:
>>>>>> (...)
>>>>>>> Trying another way of phrasing it:
>>>>>>> Data is permanently de-identified (and hence out of the scope of this
>>>>>>> specification) when a sufficient combination of technical measures
>>>>>>> and
>>>>>>> restrictions ensures that the data does not, and cannot and will not
>>>>>>> be used to, identify a particular user, user-agent, or device.
>>>>>>> Note: Usage and/or distribution restrictions are strongly recommended
>>>>>>> for any dataset that has records that relate to a single user or a
>>>>>>> small number of users; experience has shown that such records can, in
>>>>>>> fact, sometimes be used to identify the user(s) despite the technical
>>>>>>> measures that were taken to prevent that happening.
>>>>>>> David Singer
>>>>>>> Manager, Software Standards, Apple Inc.
>>>>> David Singer
>>>>> Manager, Software Standards, Apple Inc.
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.13 (MingW32)
>>> Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/
>>> Charset: utf-8
>>> iQEcBAEBAgAGBQJT7QwIAAoJEHMxUy4uXm2J7vkIAOUDdIGXlCpvJw9U/KYAbjCN
>>> I/T2dcIsN3Bd095aNyj+eTiC32sQ96Tc5+q//f9zLx+/CERbIy5/lOhfEQpC6z4z
>>> gQuJC/Ol691owAGEQFAQEN7sZ4u5nhFFuJzhPnZILBi9tzBj4wLByxskGgf3yMyT
>>> rlYi50rZpTghA4QOKvszDxAgP/hyRnk2cjWcCCjaiMWVKQh3j7aKUtit4JgU/JKb
>>> ME50WRt43StzEtcaFfsPGHzwVjG/3z5wqEMWSTnwuyq68OfN8U3g0hmaDhJUzwoU
>>> P5+tPJOImfOSr0H5eCIXQkKLP6sz8HSrt+HPcNrAO/uKCmIGKlD4AAqSe5Ji0gI=
>>> =oQfL
>>> -----END PGP SIGNATURE-----
>> David Singer
>> Manager, Software Standards, Apple Inc.
> 
>
Received on Friday, 15 August 2014 10:18:29 UTC