Re: Deidentification (ISSUE-188) from Roy T. Fielding on 2014-07-17 (public-tracking@w3.org from July 2014)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Thu, 17 Jul 2014 14:09:22 -0700
To: TOUBIANA Vincent <vtoubiana@cnil.fr>
Cc: "Justin Brookman" <jbrookman@cdt.org>, <public-tracking@w3.org>
Message-Id: <55AB5613-EAA0-40DD-8747-31E29BD5C792@gbiv.com>

On Jul 16, 2014, at 7:44 AM, TOUBIANA Vincent wrote:

> Hi Justin,
>  
> I’d like to propose a definition of de-identification which is closer to the concept of anonymization defined in the Article 29 Opinion (http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf).
>  
> A data-set is de-identified when it is no longer possible to:
> - isolate some or all records which correspond to a device in the dataset,
> - link, at least, two records concerning the same device,
> - deduce, with significant probability, the value of an attribute from the values of a set of other attributes.
>  
> The third criteria may -- in some cases -- go beyond de-identification but the first two are, in my opinion, required to limit re-identification risks.

No.  A set of log entries for a single request might consist of ten
to twenty records with the same request-id, which is no more an indication
of tracking than having a single very large record with the same information.
The mechanism of records has no relevance to the actual privacy concern,
which is that the data can be linked to a particular user.  How many
records that involves, or how many deductions are needed, is superfluous.

In any case, I have no idea what the third bullet means, and I am pretty
sure that I would not consider it de-identified if the data set included
a single record with my name and home address.

....Roy

Received on Thursday, 17 July 2014 21:09:46 UTC