RE: Deidentification (ISSUE-188)

>> With the  property on linkability, in an anonymized dataset you should not be able to link two transactions from the same device and you should not be able to link a transaction to another dataset.
>No.  That is simply wrong.  All session-based interactions with users depend on the linking of multiple interactions over time, each of which must remain linked in the dataset if the site is going to make any meaningful use of them.  Linking data records doesn't have anything to do with privacy or EU data protection.


If data records can be linked then there is a significant chance we’re not talking about anonymized data.

>That does not, in any way, imply that the data set remains linked to the user, which is what linkability means to data protection. (Linking to the user's device is just an indirect linking to the user).

With respect to anonymization techniques, this is the most suitable definition. We had a similar discussion about the definition of unlikability a while ago, I suggested using the terminology of ISO/IEC 15408-2 (http://lists.w3.org/Archives/Public/public-tracking/2012Nov/0255.html). 

>The Article 29 definition of linkability is simply wrong: it seems to be entirely misinformed about what that term means with regard to data protection. Maybe that's why we are calling it de-identification instead.

Again with respect to anonymization this is the definition provided by ISO/IEC 15408-2.  If you have a different definition which is more widely used, please provide a link to it.

> You are trying to prevent identifying a user via data correlation and have construed the definition as if that is all that matters. As a side-effect, you are preventing normal operation of a site
in terms of evaluating which user agent software doesn't work well,or where to place UI elements in a window, or what sets of content lead to a conversion (as opposed to boredom or leaving the site).

I don’t think how this would happened in a *third party* context. In a first party context this would be allowed so I’m not sure I would prevent any normal operation.


>My proposed text says that the dataset is de-identified when it cannot be used to identify a particular user.  How it might be used to do so is irrelevant -- any mechanism applies, including data correlation.

How do you practically evaluate that? You cannot provide a method to de-identify data with a high level of confidence. It is not enforceable. This definition does not provide any guarantee to the user that, in practice, he cannot be identified he has to rely on the statement. From a data-controller point of view you would have to constantly re-evaluate if the dataset is de-identified.

Vincent

Received on Saturday, 19 July 2014 14:32:00 UTC