Re: Deidentification (ISSUE-188)

On Aug 14, 2014, at 12:20 , Mike O'Neill <michael.oneill@baycloud.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I agree, using a verb assumes that you already have data about people and you apply a de-identifying process to it. It is the process that is hard to define, without leaving loopholes. 

Precisely.  I am not trying to define a process; I am defining by the result.  The result is a set of data that will never be linked to any specific user, user-agent, or device, by a suitable combination of technical (deidentification) measures, and restrictions (e.g. you are not allowed to try, you are not allowed to distribute this to anyone unless they agree not to try, and so on).  I don’t want to define the technical processes or the restrictions, just the result;  the data never gets linked to a specific user, user-agent, or device, ever again.

> 
> What is in scope is tracking data, and DNT should just mean do not collect it (unless you claim a permitted use). If you have collected in error just delete it. 
> 
> Maybe that is all we need to say.

Maybe you need to review where we use the term again.

1.  Third parties can collect data if
  a) they have an exception
  b) they have a permitted use
  c) it’s deidentified
2.  Unknowing collection.  You have to render the data out of scope or delete it, and out of scope means you have permanently deidentified it.
3. Discussed but not yet in the spec.: the ‘raw data’ problem (companies cannot process raw logs in real time). Keep the raw data until you can process it, but the raw data has only 3 possible exits (like third party data):
  a) it’s identifiable, but the user allowed you to collect it
  b) it’s identifiable, but there is a permitted use you claim and you adhere to the restrictions of that permitted use
  c) it’s not identifiable, it’s been deidentified

For all these, we need a definition of what data in a deidentified state means.  To me, it means it’s got detached from any given user (user-agent, or device) and can and/or will never be reattached.

> 
> Mike
> 
> 
>> -----Original Message-----
>> From: Rob van Eijk [mailto:rob@blaeu.com]
>> Sent: 14 August 2014 19:55
>> To: David Singer
>> Cc: Justin Brookman; public-tracking@w3.org; Mike O'Neill
>> Subject: Re: Deidentification (ISSUE-188)
>> 
>> The core of my issue, which may be a symantic issue, is that the current
>> text is fixed on the word identification. To me it is not clear enough
>> from the current definition that anything else than the 'one way street'
>> is considered re-identification. The definition must be more specific on
>> this point.
>> 
>> Does cookie-syncing (which is commonly used in real-time bidding) fall
>> under the meaning of re-identification?
>> 
>> Rob
>> 
>> David Singer schreef op 2014-08-14 18:37:
>>> Rob, I am sorry, I don’t follow you at all.
>>> 
>>> We say in a number of places that data passes out of our scope, and
>>> hence we say nothing at all about it, once it has been deidentified.
>>> We need to define what we mean by that, and we need to define that
>>> ‘exit’ from our scope.
>>> 
>>> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote:
>>> 
>>>> 
>>>> The text you propose connects the state of a permanently de-identified
>>>> dataset to the possibility of identifying a user/user-agent or device.
>>>> I think limiting the approach to identification is way too limited.
>>>> What is not covered is for example:
>>>> - the sharing (for e.g. data enrichment and data correlation).
>>> 
>>> if it doesn’t identify anyone, and won’t/can’t, we have nothing to say
>>> about sharing it
>>> 
>>>> - the application of de-identified data to the individusl user/user
>>>> agent/device (for e.g. re-targeting).
>>> 
>>> That’s re-identification, and my text says (a) it ought not be
>>> possible and (b) it ought not be permitted
>>> 
>>>> - the retention of data meaning the duration of time that would be
>>>> allowed to bring data in de-identified state.
>>> 
>>> That’s a separate question: the ‘raw data’ question (and one of the
>>> exits for raw data is that the data is deidentified)
>>> 
>>>> - any (unintended/unforeseen) data uses that may have an impact on a
>>>> (the personal space) of a user/user agent/device. For example
>>>> re-targeting based on de-identified data, or re-targeting based on
>>>> correlation with de-identified data.
>>> 
>>> I don’t understand how one can target anyone if the data is
>>> deidentified, and if it’s reidentified, then it wasn’t deidentified to
>>> this definition (the definition insists it is a one-way street).
>>> 
>>>> 
>>>> My proposal is to exclude text for de-identified data in order to aim
>>>> for a cleaner specification.
>>> 
>>> Again, I don’t understand.  The point of defining it is to say “how to
>>> get out of the scope of this spec.”.  For example, the raw data clause
>>> I proposed says there are only 3 exits:
>>> * you have permission from the user to retain the data
>>> * you retain the data under a permitted use, in accordance with the
>>> terms of that permitted use
>>> * you deidentify the data so it passes out of our scope
>>> 
>>> 
>>>> 
>>>> Rob
>>>> 
>>>> David Singer schreef op 2014-08-14 01:58:
>>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com>
>>>>> wrote:
>>>> (...)
>>>>> Trying another way of phrasing it:
>>>>> Data is permanently de-identified (and hence out of the scope of this
>>>>> specification) when a sufficient combination of technical measures
>>>>> and
>>>>> restrictions ensures that the data does not, and cannot and will not
>>>>> be used to, identify a particular user, user-agent, or device.
>>>>> Note: Usage and/or distribution restrictions are strongly recommended
>>>>> for any dataset that has records that relate to a single user or a
>>>>> small number of users; experience has shown that such records can, in
>>>>> fact, sometimes be used to identify the user(s) despite the technical
>>>>> measures that were taken to prevent that happening.
>>>>> David Singer
>>>>> Manager, Software Standards, Apple Inc.
>>> 
>>> David Singer
>>> Manager, Software Standards, Apple Inc.
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.13 (MingW32)
> Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/
> Charset: utf-8
> 
> iQEcBAEBAgAGBQJT7QwIAAoJEHMxUy4uXm2J7vkIAOUDdIGXlCpvJw9U/KYAbjCN
> I/T2dcIsN3Bd095aNyj+eTiC32sQ96Tc5+q//f9zLx+/CERbIy5/lOhfEQpC6z4z
> gQuJC/Ol691owAGEQFAQEN7sZ4u5nhFFuJzhPnZILBi9tzBj4wLByxskGgf3yMyT
> rlYi50rZpTghA4QOKvszDxAgP/hyRnk2cjWcCCjaiMWVKQh3j7aKUtit4JgU/JKb
> ME50WRt43StzEtcaFfsPGHzwVjG/3z5wqEMWSTnwuyq68OfN8U3g0hmaDhJUzwoU
> P5+tPJOImfOSr0H5eCIXQkKLP6sz8HSrt+HPcNrAO/uKCmIGKlD4AAqSe5Ji0gI=
> =oQfL
> -----END PGP SIGNATURE-----
> 

David Singer
Manager, Software Standards, Apple Inc.

Received on Thursday, 14 August 2014 20:05:24 UTC