- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Fri, 18 Jul 2014 11:13:57 -0700
- To: TOUBIANA Vincent <vtoubiana@cnil.fr>
- Cc: "Justin Brookman" <jbrookman@cdt.org>, <public-tracking@w3.org>
- Message-Id: <653C5A58-581D-45AC-BC9F-89CEBB56E8FF@gbiv.com>
On Jul 18, 2014, at 2:38 AM, TOUBIANA Vincent wrote: > Hi Roy, > > I should have replaced “record” by “transaction” which would have make things more clear. In the example you give, all the record would be considered as one transaction thus solving the problem. That doesn't make it more clear. You are just replacing one commonly used database term by another term that is used in both databases and commerce. Depending on who you talk to, a transaction could be a single operation, a set of related operations, or any large number of operations that eventually result in an exchange of goods. None of which has anything to do with linkability. > With the property on linkability, in an anonymized dataset you should not be able to link two transactions from the same device and you should not be able to link a transaction to another dataset. No. That is simply wrong. All session-based interactions with users depend on the linking of multiple interactions over time, each of which must remain linked in the dataset if the site is going to make any meaningful use of them. Linking data records doesn't have anything to do with privacy or EU data protection. That does not, in any way, imply that the data set remains linked to the user, which is what linkability means to data protection. (Linking to the user's device is just an indirect linking to the user). The de-identified data can remain linked together as related interactions after the identifying data has been removed from all records, which includes removal of information in the dataset that might be unique to a small set of users (queries, real times, etc.). > In the Article 29 Opinion, linkability is defined as “the ability to link, at least, two records concerning the same data subject or a group of data subjects (either in the same database or in two different databases). “ The Article 29 definition of linkability is simply wrong: it seems to be entirely misinformed about what that term means with regard to data protection. Maybe that's why we are calling it de-identification instead. > . I did not adapted this definition to the DNT context correctly, here is a more suitable definition: > > A data-set is de-identified when it is no longer possible to: > - isolate some or all transactions which correspond to a device or user, > - link, two transaction concerning the same device or user (either in the same database or in two different databases), > - deduce, with significant probability, information about a user or device. > > Thank you for your feedback. You are trying to prevent identifying a user via data correlation and have construed the definition as if that is all that matters. As a side-effect, you are preventing normal operation of a site in terms of evaluating which user agent software doesn't work well, or where to place UI elements in a window, or what sets of content lead to a conversion (as opposed to boredom or leaving the site). My proposed text says that the dataset is de-identified when it cannot be used to identify a particular user. How it might be used to do so is irrelevant -- any mechanism applies, including data correlation. How the data is constructed doesn't matter. How it is combined with other datasets doesn't matter. The only thing that matters is whether the dataset is capable of revealing anything about a particular user. ....Roy
Received on Friday, 18 July 2014 18:14:20 UTC