Re: Deidentification (ISSUE-188) from David Singer on 2014-08-14 (public-tracking@w3.org from August 2014)

From: David Singer <singer@apple.com>
Date: Thu, 14 Aug 2014 09:37:07 -0700
To: rob@blaeu.com
Cc: Justin Brookman <jbrookman@cdt.org>, "<public-tracking@w3.org>" <public-tracking@w3.org>, Mike O'Neill <michael.oneill@baycloud.com>
Message-id: <F5194C00-9FC7-45EF-808B-DB750B8B3187@apple.com>

Rob, I am sorry, I don’t follow you at all.

We say in a number of places that data passes out of our scope, and hence we say nothing at all about it, once it has been deidentified. We need to define what we mean by that, and we need to define that ‘exit’ from our scope.

On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote:

> 
> The text you propose connects the state of a permanently de-identified dataset to the possibility of identifying a user/user-agent or device. I think limiting the approach to identification is way too limited.
> What is not covered is for example:
> - the sharing (for e.g. data enrichment and data correlation).

if it doesn’t identify anyone, and won’t/can’t, we have nothing to say about sharing it

> - the application of de-identified data to the individusl user/user agent/device (for e.g. re-targeting).

That’s re-identification, and my text says (a) it ought not be possible and (b) it ought not be permitted

> - the retention of data meaning the duration of time that would be allowed to bring data in de-identified state.

That’s a separate question: the ‘raw data’ question (and one of the exits for raw data is that the data is deidentified)

> - any (unintended/unforeseen) data uses that may have an impact on a (the personal space) of a user/user agent/device. For example re-targeting based on de-identified data, or re-targeting based on correlation with de-identified data.

I don’t understand how one can target anyone if the data is deidentified, and if it’s reidentified, then it wasn’t deidentified to this definition (the definition insists it is a one-way street).

> 
> My proposal is to exclude text for de-identified data in order to aim for a cleaner specification.

Again, I don’t understand.  The point of defining it is to say “how to get out of the scope of this spec.”.  For example, the raw data clause I proposed says there are only 3 exits:
* you have permission from the user to retain the data
* you retain the data under a permitted use, in accordance with the terms of that permitted use
* you deidentify the data so it passes out of our scope


> 
> Rob
> 
> David Singer schreef op 2014-08-14 01:58:
>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com> wrote:
> (...)
>> Trying another way of phrasing it:
>> Data is permanently de-identified (and hence out of the scope of this
>> specification) when a sufficient combination of technical measures and
>> restrictions ensures that the data does not, and cannot and will not
>> be used to, identify a particular user, user-agent, or device.
>> Note: Usage and/or distribution restrictions are strongly recommended
>> for any dataset that has records that relate to a single user or a
>> small number of users; experience has shown that such records can, in
>> fact, sometimes be used to identify the user(s) despite the technical
>> measures that were taken to prevent that happening.
>> David Singer
>> Manager, Software Standards, Apple Inc.

David Singer
Manager, Software Standards, Apple Inc.

Received on Thursday, 14 August 2014 16:37:38 UTC