Re: Deidentification (ISSUE-188) from Rob van Eijk on 2014-08-14 (public-tracking@w3.org from August 2014)

From: Rob van Eijk <rob@blaeu.com>
Date: Thu, 14 Aug 2014 20:54:59 +0200
To: David Singer <singer@apple.com>
Cc: Justin Brookman <jbrookman@cdt.org>, <public-tracking@w3.org>, "Mike O'Neill" <michael.oneill@baycloud.com>
Message-ID: <75be8ef9229848b441622d34dff59cef@xs4all.nl>

The core of my issue, which may be a symantic issue, is that the current 
text is fixed on the word identification. To me it is not clear enough 
from the current definition that anything else than the 'one way street' 
is considered re-identification. The definition must be more specific on 
this point.

Does cookie-syncing (which is commonly used in real-time bidding) fall 
under the meaning of re-identification?

Rob

David Singer schreef op 2014-08-14 18:37:
> Rob, I am sorry, I don’t follow you at all.
> 
> We say in a number of places that data passes out of our scope, and
> hence we say nothing at all about it, once it has been deidentified.
> We need to define what we mean by that, and we need to define that
> ‘exit’ from our scope.
> 
> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote:
> 
>> 
>> The text you propose connects the state of a permanently de-identified 
>> dataset to the possibility of identifying a user/user-agent or device. 
>> I think limiting the approach to identification is way too limited.
>> What is not covered is for example:
>> - the sharing (for e.g. data enrichment and data correlation).
> 
> if it doesn’t identify anyone, and won’t/can’t, we have nothing to say
> about sharing it
> 
>> - the application of de-identified data to the individusl user/user 
>> agent/device (for e.g. re-targeting).
> 
> That’s re-identification, and my text says (a) it ought not be
> possible and (b) it ought not be permitted
> 
>> - the retention of data meaning the duration of time that would be 
>> allowed to bring data in de-identified state.
> 
> That’s a separate question: the ‘raw data’ question (and one of the
> exits for raw data is that the data is deidentified)
> 
>> - any (unintended/unforeseen) data uses that may have an impact on a 
>> (the personal space) of a user/user agent/device. For example 
>> re-targeting based on de-identified data, or re-targeting based on 
>> correlation with de-identified data.
> 
> I don’t understand how one can target anyone if the data is
> deidentified, and if it’s reidentified, then it wasn’t deidentified to
> this definition (the definition insists it is a one-way street).
> 
>> 
>> My proposal is to exclude text for de-identified data in order to aim 
>> for a cleaner specification.
> 
> Again, I don’t understand.  The point of defining it is to say “how to
> get out of the scope of this spec.”.  For example, the raw data clause
> I proposed says there are only 3 exits:
> * you have permission from the user to retain the data
> * you retain the data under a permitted use, in accordance with the
> terms of that permitted use
> * you deidentify the data so it passes out of our scope
> 
> 
>> 
>> Rob
>> 
>> David Singer schreef op 2014-08-14 01:58:
>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com> 
>>> wrote:
>> (...)
>>> Trying another way of phrasing it:
>>> Data is permanently de-identified (and hence out of the scope of this
>>> specification) when a sufficient combination of technical measures 
>>> and
>>> restrictions ensures that the data does not, and cannot and will not
>>> be used to, identify a particular user, user-agent, or device.
>>> Note: Usage and/or distribution restrictions are strongly recommended
>>> for any dataset that has records that relate to a single user or a
>>> small number of users; experience has shown that such records can, in
>>> fact, sometimes be used to identify the user(s) despite the technical
>>> measures that were taken to prevent that happening.
>>> David Singer
>>> Manager, Software Standards, Apple Inc.
> 
> David Singer
> Manager, Software Standards, Apple Inc.

Received on Thursday, 14 August 2014 18:56:41 UTC