W3C home > Mailing lists > Public > public-tracking@w3.org > August 2014

Re: Deidentification (ISSUE-188)

From: David Singer <singer@apple.com>
Date: Thu, 14 Aug 2014 12:57:03 -0700
Cc: Justin Brookman <jbrookman@cdt.org>, public-tracking@w3.org, Mike O'Neill <michael.oneill@baycloud.com>
Message-id: <D927E326-34BF-4F59-9646-838D3356E0B2@apple.com>
To: rob@blaeu.com

On Aug 14, 2014, at 11:54 , Rob van Eijk <rob@blaeu.com> wrote:

> The core of my issue, which may be a symantic issue, is that the current text is fixed on the word identification. To me it is not clear enough from the current definition that anything else than the 'one way street' is considered re-identification. The definition must be more specific on this point.

The data has to remain forever in a state in which it does not link to a specific user, user-agent, or device.  How much more specific can we be?

> Does cookie-syncing (which is commonly used in real-time bidding) fall under the meaning of re-identification?

If you sync two deidentified data sets, they remain deidentified.  But you had better be damn sure no-one can identify the user given the user-unique cookie, which seems…unlikely.

> Rob
> David Singer schreef op 2014-08-14 18:37:
>> Rob, I am sorry, I don’t follow you at all.
>> We say in a number of places that data passes out of our scope, and
>> hence we say nothing at all about it, once it has been deidentified.
>> We need to define what we mean by that, and we need to define that
>> ‘exit’ from our scope.
>> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote:
>>> The text you propose connects the state of a permanently de-identified dataset to the possibility of identifying a user/user-agent or device. I think limiting the approach to identification is way too limited.
>>> What is not covered is for example:
>>> - the sharing (for e.g. data enrichment and data correlation).
>> if it doesn’t identify anyone, and won’t/can’t, we have nothing to say
>> about sharing it
>>> - the application of de-identified data to the individusl user/user agent/device (for e.g. re-targeting).
>> That’s re-identification, and my text says (a) it ought not be
>> possible and (b) it ought not be permitted
>>> - the retention of data meaning the duration of time that would be allowed to bring data in de-identified state.
>> That’s a separate question: the ‘raw data’ question (and one of the
>> exits for raw data is that the data is deidentified)
>>> - any (unintended/unforeseen) data uses that may have an impact on a (the personal space) of a user/user agent/device. For example re-targeting based on de-identified data, or re-targeting based on correlation with de-identified data.
>> I don’t understand how one can target anyone if the data is
>> deidentified, and if it’s reidentified, then it wasn’t deidentified to
>> this definition (the definition insists it is a one-way street).
>>> My proposal is to exclude text for de-identified data in order to aim for a cleaner specification.
>> Again, I don’t understand.  The point of defining it is to say “how to
>> get out of the scope of this spec.”.  For example, the raw data clause
>> I proposed says there are only 3 exits:
>> * you have permission from the user to retain the data
>> * you retain the data under a permitted use, in accordance with the
>> terms of that permitted use
>> * you deidentify the data so it passes out of our scope
>>> Rob
>>> David Singer schreef op 2014-08-14 01:58:
>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com> wrote:
>>> (...)
>>>> Trying another way of phrasing it:
>>>> Data is permanently de-identified (and hence out of the scope of this
>>>> specification) when a sufficient combination of technical measures and
>>>> restrictions ensures that the data does not, and cannot and will not
>>>> be used to, identify a particular user, user-agent, or device.
>>>> Note: Usage and/or distribution restrictions are strongly recommended
>>>> for any dataset that has records that relate to a single user or a
>>>> small number of users; experience has shown that such records can, in
>>>> fact, sometimes be used to identify the user(s) despite the technical
>>>> measures that were taken to prevent that happening.
>>>> David Singer
>>>> Manager, Software Standards, Apple Inc.
>> David Singer
>> Manager, Software Standards, Apple Inc.

David Singer
Manager, Software Standards, Apple Inc.
Received on Thursday, 14 August 2014 19:57:34 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:40:12 UTC