RE: Deidentification (ISSUE-188)

Hi Roy,

 

I should have replaced "record" by "transaction" which would have make things more clear. In the example you give, all the record would be considered as one transaction thus solving the problem.

 

With the  property on linkability, in an anonymized dataset you should not be able to link two transactions from the same device and you should not be able to link a transaction to another dataset. In the Article 29 Opinion, linkability is defined as "the ability to link, at least, two records concerning the same data subject or a group of data subjects (either in the same database or in two different databases). ". I did not adapted this definition to the DNT context correctly, here is a more suitable definition:

 

A data-set is de-identified when it is no longer possible to:

- isolate some or all transactions which correspond to a device or user,

- link, two transaction concerning the same device or user (either in the same database or in two different databases),

- deduce, with significant probability, information about a user or device.

 

Thank you for your feedback.

 

Best regards,

 

Vincent

 

 

De : Roy T. Fielding [mailto:fielding@gbiv.com] 
Envoyé : jeudi 17 juillet 2014 23:09
À : TOUBIANA Vincent
Cc : Justin Brookman; public-tracking@w3.org
Objet : Re: Deidentification (ISSUE-188)

 

On Jul 16, 2014, at 7:44 AM, TOUBIANA Vincent wrote:





Hi Justin,

 

I'd like to propose a definition of de-identification which is closer to the concept of anonymization defined in the Article 29 Opinion (http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf).

 

A data-set is de-identified when it is no longer possible to:

- isolate some or all records which correspond to a device in the dataset,

- link, at least, two records concerning the same device,

- deduce, with significant probability, the value of an attribute from the values of a set of other attributes.

 

The third criteria may -- in some cases -- go beyond de-identification but the first two are, in my opinion, required to limit re-identification risks.

 

No.  A set of log entries for a single request might consist of ten

to twenty records with the same request-id, which is no more an indication

of tracking than having a single very large record with the same information.

The mechanism of records has no relevance to the actual privacy concern,

which is that the data can be linked to a particular user.  How many

records that involves, or how many deductions are needed, is superfluous.

 

In any case, I have no idea what the third bullet means, and I am pretty

sure that I would not consider it de-identified if the data set included

a single record with my name and home address.

 

....Roy

 

Received on Friday, 18 July 2014 09:39:40 UTC