- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Wed, 15 May 2013 18:12:31 -0700
- To: Shane Wiley <wileys@yahoo-inc.com>
- Cc: "rob@blaeu.com" <rob@blaeu.com>, John Simpson <john@consumerwatchdog.org>, "Tracking Protection Working Group" <public-tracking@w3.org>
I will agree and disagree with both of you ... Shane is right in that de-identified is a more natural term for the yellow state, but only in a world where the FTC (and this group) doesn't oddly define de-identified as being a replacement term for the more stringent (and confusing for hypertext) unlinkable. Hence, Shane's reuse of de-identified in this context is wrong. Likewise, whether the raw data is personally identifiable, pseudonymous, or entirely anonymous is completely unknown, since it depends both on what the client decides to send and what the server decides to keep/log. Thus, I agree with Rob that the red state should not be labelled pseudonymous, but I disagree that it would make any sense to label the yellow state as pseudonymous. The privacy state of the yellow data is actually stronger than pseudonymous: an inspector with access to the browser's client-side storage can easily map stored pseudonyms to the data in the red state, but would would not be able to do the same for data in the yellow state unless they also have the ability to reproduce the pseudonym hash. [Note that I am assuming here that such a browser's history has been cleared, since an inspector doesn't need yellow data if it already has access to the browser's history.] A "hashed pseudonymous" label would be more accurate for the yellow state, assuming that anonymous data goes directly from red to green. However, IMO, Shane's diagrams are specific to data collection for advertising and advertising support services. If DNT were limited to advertising, I'd be a happy camper, but I somehow doubt that is the intended goal. Finally, I want to express my discomfort with the way this discussion is progressing. As I've said numerous times, I am willing to turn off (cross-site) tracking for users that have indicated such as their legitimate preference. That is the entire scope of what I am willing to do under the auspices of a standard produced by this group. There are a lot of other data sharing and retention requirements that are related to privacy and subject to regulatory requirements, but those concerns are outside the scope of this work. They should apply whether or not the user signals DNT:1, and thus do not need to be addressed by our specifications. Blanket requirements on DNT:1 retained data being in a particular category of "anonymous", "pseudonymous", or "unlinkable" would presume that implementations actively eliminate data sent by the client that might correlate to a single user, user agent, or device. I am not able to implement such a requirement in general, and will not pretend to do so. Marking the data record as DNT:1, for the sake of restricting later processing, is easy; deciding that the contents of a data record will not be identifying when combined with past and future data sets is impossible in general. Even if we restricted such processing to a specific data set, it only makes economic sense to perform that processing when someone with a budget decides they want to share it in a non-aggregate form. At that point, we can and should de-identify the data to be shared regardless of the user's DNT setting. ....Roy
Received on Thursday, 16 May 2013 01:12:55 UTC