Re: Deidentification (ISSUE-188) from Lee Tien on 2014-08-15 (public-tracking@w3.org from August 2014)

From: Lee Tien <tien@eff.org>
Date: Fri, 15 Aug 2014 09:25:09 -0700
To: Jeffrey Chester <jeff@democraticmedia.org>
Cc: Rob van Eijk <rob@blaeu.com>, David Singer <singer@apple.com>, Mike O'Neill <michael.oneill@baycloud.com>, Justin Brookman <jbrookman@cdt.org>, "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-Id: <3B020081-5C04-4611-BAD0-C8BCCAD58450@eff.org>
EFF agrees: transparency in de-identification methods is very important and is far superior for users than the old-school "expert certification without showing your work" approach.  

Lee

Sent from my iPhone

On Aug 15, 2014, at 3:18 AM, Jeffrey Chester <jeff@democraticmedia.org> wrote:

> CDD strongly supports this proposal.  This is a key aspect of transparency essential for users.
> 
> Jeff
> 
> 
> Jeffrey Chester
> Center for Digital Democracy
> 1621 Connecticut Ave, NW, Suite 550
> Washington, DC 20009
> www.democraticmedia.org
> www.digitalads.org
> 202-986-2220
> 
> On Aug 14, 2014, at 7:04 PM, Rob van Eijk <rob@blaeu.com> wrote:
> 
>> 
>> If the definition gets adopted, wouldn't it be fair to the user to include text with a normative MUST for a party to provide detailed information about the details of the de-identification process(es) it applies? Transparency should do it's work to prevent "de-identification by obscurity".
>> 
>> Is the group willing to consider such a normative obligation?
>> 
>> Rob
>> 
>> David Singer schreef op 2014-08-14 22:04:
>>> On Aug 14, 2014, at 12:20 , Mike O'Neill <michael.oneill@baycloud.com> wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>> I agree, using a verb assumes that you already have data about people and you apply a de-identifying process to it. It is the process that is hard to define, without leaving loopholes.
>>> Precisely.  I am not trying to define a process; I am defining by the
>>> result.  The result is a set of data that will never be linked to any
>>> specific user, user-agent, or device, by a suitable combination of
>>> technical (deidentification) measures, and restrictions (e.g. you are
>>> not allowed to try, you are not allowed to distribute this to anyone
>>> unless they agree not to try, and so on).  I don’t want to define the
>>> technical processes or the restrictions, just the result;  the data
>>> never gets linked to a specific user, user-agent, or device, ever
>>> again.
>>>> What is in scope is tracking data, and DNT should just mean do not collect it (unless you claim a permitted use). If you have collected in error just delete it.
>>>> Maybe that is all we need to say.
>>> Maybe you need to review where we use the term again.
>>> 1.  Third parties can collect data if
>>>  a) they have an exception
>>>  b) they have a permitted use
>>>  c) it’s deidentified
>>> 2.  Unknowing collection.  You have to render the data out of scope or
>>> delete it, and out of scope means you have permanently deidentified
>>> it.
>>> 3. Discussed but not yet in the spec.: the ‘raw data’ problem
>>> (companies cannot process raw logs in real time). Keep the raw data
>>> until you can process it, but the raw data has only 3 possible exits
>>> (like third party data):
>>>  a) it’s identifiable, but the user allowed you to collect it
>>>  b) it’s identifiable, but there is a permitted use you claim and you
>>> adhere to the restrictions of that permitted use
>>>  c) it’s not identifiable, it’s been deidentified
>>> For all these, we need a definition of what data in a deidentified
>>> state means.  To me, it means it’s got detached from any given user
>>> (user-agent, or device) and can and/or will never be reattached.
>>>> Mike
>>>>> -----Original Message-----
>>>>> From: Rob van Eijk [mailto:rob@blaeu.com]
>>>>> Sent: 14 August 2014 19:55
>>>>> To: David Singer
>>>>> Cc: Justin Brookman; public-tracking@w3.org; Mike O'Neill
>>>>> Subject: Re: Deidentification (ISSUE-188)
>>>>> The core of my issue, which may be a symantic issue, is that the current
>>>>> text is fixed on the word identification. To me it is not clear enough
>>>>> from the current definition that anything else than the 'one way street'
>>>>> is considered re-identification. The definition must be more specific on
>>>>> this point.
>>>>> Does cookie-syncing (which is commonly used in real-time bidding) fall
>>>>> under the meaning of re-identification?
>>>>> Rob
>>>>> David Singer schreef op 2014-08-14 18:37:
>>>>>> Rob, I am sorry, I don’t follow you at all.
>>>>>> We say in a number of places that data passes out of our scope, and
>>>>>> hence we say nothing at all about it, once it has been deidentified.
>>>>>> We need to define what we mean by that, and we need to define that
>>>>>> ‘exit’ from our scope.
>>>>>> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote:
>>>>>>> The text you propose connects the state of a permanently de-identified
>>>>>>> dataset to the possibility of identifying a user/user-agent or device.
>>>>>>> I think limiting the approach to identification is way too limited.
>>>>>>> What is not covered is for example:
>>>>>>> - the sharing (for e.g. data enrichment and data correlation).
>>>>>> if it doesn’t identify anyone, and won’t/can’t, we have nothing to say
>>>>>> about sharing it
>>>>>>> - the application of de-identified data to the individusl user/user
>>>>>>> agent/device (for e.g. re-targeting).
>>>>>> That’s re-identification, and my text says (a) it ought not be
>>>>>> possible and (b) it ought not be permitted
>>>>>>> - the retention of data meaning the duration of time that would be
>>>>>>> allowed to bring data in de-identified state.
>>>>>> That’s a separate question: the ‘raw data’ question (and one of the
>>>>>> exits for raw data is that the data is deidentified)
>>>>>>> - any (unintended/unforeseen) data uses that may have an impact on a
>>>>>>> (the personal space) of a user/user agent/device. For example
>>>>>>> re-targeting based on de-identified data, or re-targeting based on
>>>>>>> correlation with de-identified data.
>>>>>> I don’t understand how one can target anyone if the data is
>>>>>> deidentified, and if it’s reidentified, then it wasn’t deidentified to
>>>>>> this definition (the definition insists it is a one-way street).
>>>>>>> My proposal is to exclude text for de-identified data in order to aim
>>>>>>> for a cleaner specification.
>>>>>> Again, I don’t understand.  The point of defining it is to say “how to
>>>>>> get out of the scope of this spec.”.  For example, the raw data clause
>>>>>> I proposed says there are only 3 exits:
>>>>>> * you have permission from the user to retain the data
>>>>>> * you retain the data under a permitted use, in accordance with the
>>>>>> terms of that permitted use
>>>>>> * you deidentify the data so it passes out of our scope
>>>>>>> Rob
>>>>>>> David Singer schreef op 2014-08-14 01:58:
>>>>>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com>
>>>>>>>> wrote:
>>>>>>> (...)
>>>>>>>> Trying another way of phrasing it:
>>>>>>>> Data is permanently de-identified (and hence out of the scope of this
>>>>>>>> specification) when a sufficient combination of technical measures
>>>>>>>> and
>>>>>>>> restrictions ensures that the data does not, and cannot and will not
>>>>>>>> be used to, identify a particular user, user-agent, or device.
>>>>>>>> Note: Usage and/or distribution restrictions are strongly recommended
>>>>>>>> for any dataset that has records that relate to a single user or a
>>>>>>>> small number of users; experience has shown that such records can, in
>>>>>>>> fact, sometimes be used to identify the user(s) despite the technical
>>>>>>>> measures that were taken to prevent that happening.
>>>>>>>> David Singer
>>>>>>>> Manager, Software Standards, Apple Inc.
>>>>>> David Singer
>>>>>> Manager, Software Standards, Apple Inc.
>>>> -----BEGIN PGP SIGNATURE-----
>>>> Version: GnuPG v1.4.13 (MingW32)
>>>> Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/
>>>> Charset: utf-8
>>>> iQEcBAEBAgAGBQJT7QwIAAoJEHMxUy4uXm2J7vkIAOUDdIGXlCpvJw9U/KYAbjCN
>>>> I/T2dcIsN3Bd095aNyj+eTiC32sQ96Tc5+q//f9zLx+/CERbIy5/lOhfEQpC6z4z
>>>> gQuJC/Ol691owAGEQFAQEN7sZ4u5nhFFuJzhPnZILBi9tzBj4wLByxskGgf3yMyT
>>>> rlYi50rZpTghA4QOKvszDxAgP/hyRnk2cjWcCCjaiMWVKQh3j7aKUtit4JgU/JKb
>>>> ME50WRt43StzEtcaFfsPGHzwVjG/3z5wqEMWSTnwuyq68OfN8U3g0hmaDhJUzwoU
>>>> P5+tPJOImfOSr0H5eCIXQkKLP6sz8HSrt+HPcNrAO/uKCmIGKlD4AAqSe5Ji0gI=
>>>> =oQfL
>>>> -----END PGP SIGNATURE-----
>>> David Singer
>>> Manager, Software Standards, Apple Inc.
>> 
>> 
>
Received on Friday, 15 August 2014 16:25:54 UTC