For collected data to be retained under 3.3 (third-party compliance – “3. or, the data is de-identified as defined in this recommendation” ) it should have a stricter requirement i.e. it has to be impossible to link it back to an individual.

How about we have a different definition for this out-of-scope data i.e. call it “anonymised”, something like “has been deleted, aggregated or otherwise processed to make it impossible to link to a person”.

We then leave the definition for de-identified (the existing set of CfO proposals) for the permitted use and unknowingly collected cases.

Mike

From: Justin Brookman [mailto:jbrookman@cdt.org]
Sent: 30 July 2014 03:12
To: David Singer
Cc: public-tracking@w3.org List
Subject: Re: Deidentification (ISSUE-188)

David Singer <singer@apple.com> , 7/25/2014 4:33 PM:

On Jul 25, 2014, at 13:24 , Justin Brookman <jbrookman@cdt.org> wrote:

>
> On Jul 23, 2014, at 3:01 PM, David Singer <singer@apple.com> wrote:
>
>>
>> On Jul 23, 2014, at 11:49 , Roy T. Fielding <fielding@gbiv.com> wrote:
>>
>>> On Jul 23, 2014, at 10:22 AM, David Singer wrote:
>>>> I understand your hesitation and share some of it. However, I feel that
>>>> * de-identification has been defeated often enough that we cannot be sure people will always succeed
>>>> * a user who is harmed should be able to work out who has responsibility: someone who defied a restriction on the data, or someone who made it available without that restriction.
>>>>
>>>> There are, alas, enough people out there who would try to engineer a situation in which it appears no-one is responsible ("we did our best to make it de-id’d”, “no-one said we couldn’t try to re-id”) that I think we need to close that chink somehow, formally.
>>>
>>> The right way to do that is with an accurate definition and a separate
>>> formal requirement on any party (or third party). Mixing the two results
>>> in an incorrect definition due to the false negatives.
>>
>> I think I am fine with that; where we talk of de-identifying the data, we say that the party doing so commits to taking responsibility, or passing on the responsibility, that it is not re-identified.
>
> So David, are you OK with Roy’s definition:
>
> A data set is considered de-identified when there exists a reasonable
> level of justified confidence that none of the data within it can be
> linked to a particular user, user agent, or device.
>
> Do either of you want to suggest language for the spec to bind parties to
> not try to reidentify?

The concept appears 3 times in the TCS, and in each place, a requirement to keep it de-identified would seem tricky to write. (Someone is welcome to try).

Perhaps it would be cleaner to have two definitions:

* de-identified

* persistently de-identified

with the first being a definition of the state (as above), and the second has the data carrying the requirement requirement that the originator not attempt to re-identify, and that any sharing with another party by the originator or anyone receiving the data with this restriction, either pass on the restriction, or accept the responsibility if re-identification in fact occurs.

then we can use the one or the other in the document, as appropriate.

So this sounds like a stricter version of the red-yellow-green discussion from before. What do you envision requiring regular deidentification, and what would require persistently de-identified (really deidentified + promises/liability)? Would it be just for sharing? So there wouldn't need to be an internal promise not to reidentify, but if you release, you either get a promise or take responsibility?

What would "responsibility" look like? We can't really create a cause of action with a technical standard.

>
>
>

David Singer
Manager, Software Standards, Apple Inc.