Re: what base text to use (was re: data hygiene approach / tracking of URL data) [for auerbach]

[Text below came from Dan Auerbach and Lee Tien earlier; sending to the list in order to avoid questionnaire submission problems with length of amended text. —npdoty]

Objections to Option A:

We object to the DAA proposed text. The text lacks basic clarity, and takes positions which would seem to allow the status quo of data collection and retention to continue almost completely unabated for users who have set DNT:1. The grossly insufficient privacy protections provided by the DAA proposal are totally unacceptable.

De-identification

The de-identification proposal by the DAA is extremely confused. De-identification is a long-standing issue in the group, with a long history of discussion. The lack of agreement on this issue in the group is evidenced by major disagreements on non-normative text, on the mailing list and at face-to-faces. We simply do not agree about what properties a data set must have before it is considered de-identified and hence out of scope for DNT.

Rather than provide further clarity to help crystallize the definition of de-identification, the DAA draft further obfuscates this already complex issue. There is a large disconnect between the normative language of the DAA draft and what it purports to achieve according to comments on the mailing list and on phone calls (the latter concept will hereafter be referred to as the "Yahoo! Framework". Both the DAA normative text and the Yahoo! framework are problematic in their own right, but even more mind-boggling are the mental gymnastics of textual interpretation required to map the DAA's proposed de-identification language onto the Yahoo! Framework. Indeed, whereas the Yahoo! framework outlines a (not well-defined but more concrete) notion of red/yellow/green states of data, the normative text in the DAA draft of de-identification makes no mention of key concepts of the Yahoo! framework, such as "operational controls", or storing only geo locations instead of full IP addresses.

To illustrate the disconnect between the DAA normative language and the Yahoo! Framework, the former requires that URL information "cannot reasonably be re-associated or connected to a [...] device". On its face, this means that pseudonymized browsing histories cannot be kept, provided the pseudonym is used to connect to a device to a database. A hash of a cookie string, for instance, clearly is connecting a browser with the URL. However, according to the Yahoo! framework, this would be allowed and the resulting data set keyed by the hashed cookie would be considered de-identified. This is just one example demonstrating the fundamental lack of clarity here.

Furthermore, the DAA normative language introduces the new concept of "delinking" which is not at all well-defined. What is "within a reasonable time frame"? What are "operational or administrative controls"? What is the linking that is occurring in the first place? Some clarity about the intended interpretation was provided on the mailing list but the text as it stands now is not nearly fleshed out enough to reflect that informal understanding. While it is beyond the scope of this objection to fully argue against the intended interpretation, it is worth noting that the fundamental flaw in the underlying conceptual framework of there being a unique identifier that sits apart from the rest of the data. In fact, all fields of data may have some amount of identifying information, and de-identification must take this fact into account, however it is defined and implemented in practice.

Finally, we believe that any W3C de-identification standard must require that the de-identified "status" of a data set be reassessed over time; even if a data set is reasonably deemed de-identified today, that may not be the case after other data or other technology becomes available.  Similarly, any W3C DNT standard must facilitate the technological development and market deployment of better privacy-protecting technologies, such as by requiring that companies adopt emerging technologies, whether for de-identification or eliminating unique IDs.  The W3C DNT standard should not function as a "safe harbor" against better practices.

Unique identifiers

Removing the text on unique identifiers means that the DAA text does not require that DNT mean "Do Not Collect". But for sufficient privacy protections to exist for consumers, DNT must ensure that tracking information about individuals is not collected. While in an ideal world, the privacy policies of companies would be detailed and short and hard limits on data retention could be relied upon, there are many reasons why practically speaking we must insist on a Do Not Collect. First, companies do not have policies with short and hard limits on data retention. Second, even if strong retention limits were in place, we have seen that other actors like governments regularly have access to this data via legal process. Third, de-identification is an evolving field, and until users have more confidence that their data is being aggregated responsibly, these users have the right to opt out of collection altogether. Fourth, the policies of companies change, and there is no substitute for knowing in a reliable and auditable way that information is simply not being collected.

It has been a central tenet of EFF that DNT must curb collection. In particular, a user who indicates that she does not want to be tracked by third parties should not have a unique id assigned to her, her user agent or device, that would allow her records to be linked together. Industry claims of the impossibility of this are far overblown. Large industry players regularly forgo the use of unique id tracking cookies, and cookies are regularly blocked in the ordinary course of Internet usage for reasons related to privacy and security. The web doesn't break as a result for users who block cookies, or for companies who do not set them in every circumstance.

EFF's view on "adoption"

An apparent virtue of the DAA proposal is that it would be widely adopted in the near term.  While adoption has been part of the Working Group framework for some time now, EFF questions its importance.  Our top-priority requirement is that, under a W3C DNT standard, a user who sends DNT:1 is not tracked and thus receives privacy protections significantly greater than a user who does not send DNT:1. Whether a standard would be widely adopted in the near term must be a secondary consideration.  We fear that a widely adopted standard will be so close to the status quo that users will see no value to DNT.

Issues of process

When amendments to the June draft were solicited, the DAA submitted an entirely revised proposal, which is now being considered alongside the June draft. No other proposed textual amendments are being considered as standalone proposals requiring comments from the entire group. Had it been known that full proposals would have been considered alongside the June draft, we would likely have made efforts to submit another such proposal. As it stands, spending so much time having to re-raise longstanding objections represents a burden on EFF's very limited resources, and favors Working Group members with more resources.



Objections to Option B:

While the June Editor's draft represents a far more legitimate starting point for discussion than the DAA draft both in terms of substance as well as process, the June Editor's draft supplanted our long-standing editor's draft that had carefully tracked issues that have arisen over several years. We must be careful if we adopt the June Editor's draft not to bulldoze over longstanding disagreements but maintain the commitment to rough consensus among the various different interests present in the room.

As the W3C notes about managing dissent in a process document:

"Groups should favor proposals that create the weakest objections. This is preferred over proposals that are supported by a large majority but that cause strong objections from a few people. As part of making a decision where there is dissent, the Chair is expected to be aware of which participants work for the same (or related) Member organizations and weigh their input accordingly."

Received on Saturday, 13 July 2013 03:51:30 UTC