Re: Updated Proposal - Outline in preparation for presentation in Seattle from Jonathan Mayer on 2012-06-12 (public-tracking@w3.org from June 2012)

From: Jonathan Mayer <jmayer@stanford.edu>
Date: Mon, 11 Jun 2012 22:09:25 -0700
To: Shane Wiley <wileys@yahoo-inc.com>
Cc: "public-tracking@w3.org" <public-tracking@w3.org>
Message-ID: <44D29BC3345F4DB0AF4D4E4408A6938B@gmail.com>
Some corrections and responses below.

In short: Do Not Target + Transparency.  As best I can tell, this is the very same proposal we heard in DC with an added transparency requirement for retention periods.

Jonathan  


On Monday, June 11, 2012 at 7:25 PM, Shane Wiley wrote:

>  
> Hello TPWG,
>  
>  
>   
>  
>  
> Due to “recent activities” I’m a bit behind on providing the final presentation for our updated proposal in preparation for Seattle.  We’ll be reviewing this in more detail in Seattle but I wanted to share some of the initial elements up-front so we have time as a working group to begin discussion and consider perspectives leading up to the meeting.
>  
>  
>   
>  
>  
> ------
>  
>  
>   
>  
>  
> Goal:  Evolve DC proposal to bridge the divide with the advocate proposal and set a final recommendation for these elements
>  
>  
>   
>  
>  
> ·         Definition of First Party
>  
>  
> o   Advocate Position:  Common Branding
>  
>  
>  
>  

The preferred position I've heard is user expectations, not branding.  Many view branding as a compromise.

I also don't believe it's accurate to characterize the opposing viewpoint as simply "advocate."  Representatives from a number of companies (e.g. Mozilla and Apple), policymaking organizations (e.g. the Federal Trade Commission and Article 29 Working Party), and research institutions have signaled support for some of these views.  For lack of a better phrase, I've been calling it the pro-privacy position.  (Trite, I know.  Alternative suggestions welcome!)
>  
>  
>  
>  
> o   Industry Position:  Affiliate
>  
>  
> o   Concession Proposal:  Affiliate with “easy discoverability” (“Affiliate List” within one click from each page or owner clearly identified within one click from each page.  For example, a link in the privacy policy would meet this requirement.)
>  
>  
>  
>  


It looks like the proposals are closely aligned on this issue.  See http://jonathanmayer.github.com/dnt-compromise/compromise-proposal.html#parties.

The new industry position, to be sure, reflects a marginal transparency concession.  But it's beyond peradventure that the pro-privacy participants are giving up far more by allowing affiliate information sharing.  Affiliate information flows deviate significantly from user expectations and have been frequently abused in the context of other privacy regulation.

Rough ballpark figure: On party size, industry participants get 99% of what they want, pro-privacy participants get 1% of what they want.
>  
> ·         Permitted Uses
>  
>  
> o   Advocate Position:  Unlinkable Data w/ arbitrary “grace period”
>  
>  
>  
>  

As I explained in the DC meeting and again in an email yesterday, the pro-privacy position would facilitate "operational uses" in a variety of ways.  See http://lists.w3.org/Archives/Public/public-tracking/2012Jun/0220.html.
>  
>  
>  
>  
> o   Industry Position:  Enumerated uses, broadly scoped, general data minimization
>  
>  
> o   Concession Proposal:  Tightened up permitted uses, narrowly and strictly scoped, data minimization focus with required transparency, reasonable safeguards, defined unlinkable (highlighting this moves resulting data outside of scope)
>  
>  
>  
>  

Discussion below.  
>  
> ·         For All Permitted Uses
>  
>  
> o   What won’t occur:  Outside of Security, all other permitted uses will not allow for altering a specific user’s online experience (no profiling, no further alteration to the user experience base on profiled information)
>  
>  
>  
>  

Advertising industry self-regulation has required this since mid-2000.  See http://www.ftc.gov/os/2000/07/NAI%207-10%20Final.pdf.
>  
>  
>  
>  
> o   Data Minimization:  Each organization engaging in Permitted Uses and claiming W3C DNT compliance, must provide public transparency of their data retention period (may enumerate each individually if they vary across Permitted Uses)
>  
>  
>  
>  

As with corporate affiliation, there is only a marginal transparency concession here.  I can't quite tell—there may also be a nearly-unenforceable substantive concession (sometimes termed "reasonable minimization").
>  
>  
>  
>  
> o   Reasonable Safeguards:  Reasonable technical and organizational safeguards to prevent further processing:  collection limitations, data siloing, authorization restrictions, k-anonymity, unlinkability, retention time, anonymization, pseudonymization, and/or data encryption.
>  
>  
>  
>  

Once again, advertising industry self-regulation has imposed this requirement for over a decade.
>  
> ·         Permitted Uses:  Security/Fraud, Financial Logging/Auditing, Frequency Capping, Debugging, Aggregate Reporting*
>  
>  
>  
>  

These permitted uses are unchanged from current self-regulatory commitments.
>  
>  
>  
>  
> o   For each Permitted Use:
>  
>  
> §  (Normative) Detailed, singular business purpose description
>  
>  
> §  (Non-normative) Will explain why the processing with identifiers is proportionate
>  
>  
>  
>  

More marginal transparency concessions.  
>  
>  
>  
>  
> *NOTE – Aggregate Reporting covers general analytics needs, product improvement, and market research uses
>  
>  
>  
>  

Rough ballpark figure: On "operational uses," industry participants get 99% of what they want, pro-privacy participants get 1% of what they want.
>  
> ·         Explicit and Separate User Choice
>  
>  
> o   User must expressly activate DNT signal (TPWG already agreed on this point)
>  
>  
>  
>  

No compromise here, of course.  This is the current industry position.  See https://www.aboutads.info/resource/download/DAA_Commitment.pdf.
>  
>  
>  
>  
> o   Servers may respond to users that their UA is “invalid” if they believe this to be the case (on the hook to defend this position)
>  
>  
>  
>  

Another issue with no deviation from the industry position: websites get to ignore DNT if a browser might have set it by default.
>  
>  
>  
>  
> o   Efforts to misled users to activate DNT will be seen as “invalid”
>  
>  
>  
>  

I assume this is intended to be a user interface requirement.  Again no compromise.
>  
>  
>  
>  
> ·         With this Proposal
>  
>  
> o   Users gain a consistent, local tool to communicate their opt-out preference (avoids property specific opt-out pages)
>  
>  
> o   The users choice is persistent for each device/UA (avoids accidental deletion)
>  
>  
>  
>  

Yes, the Do Not Track technology is obviously superior to the old opt-out cookie technology.  I'm not quite certain how that's relevant here, though.  
>  
>  
>  
>  
> o   Outside of Security purposes, the user will no longer experience alterations to their online experiences derived from multi-site activity
>  
>  
>  
>  

In other words, Do Not Target.  
>  
>  
>  
>  
> o   Only minimal data is retained for necessary business operations and retention periods are transparent to users
>  
>  
>  
>  

See the discussion of transparency and minimization requirements above.  
>  
>  
>  
>  
> o   All “harms” are removed (outside of government intrusion risk where there are no documented cases of this occurring with 3rd party anonymous log file data)
>  
>  
>  
>  

The group exercise in DC emphasized how privacy risks go far beyond behavioral personalization (if that's even a privacy risk...).
>  
> ·         Unlinkability
>  
>  
>   
>  
> <Normative>
>   
> Un-linkable Data is outside of the scope of the Tracking Preference standard as information is no longer reasonably linked to a particular user, user agent, or device.  
>  
>  
>  

I presume the meaning is "linkable," not "linked," given the following definition.  
> Definition:  A dataset is un-linkable when reasonable steps have been taken to modify data such that there is confidence that it contains only information which could not be linked to a particular user, user agent, or device.
>   
> <Non-Normative>
>   
> There are many valid and technically appropriate methods to de-identify or render a data set "un-linkable".  In all cases, there should be confidence the information is unable to be reverse engineering back to a "linkable" state.  Many tests could be applied to help determine the confidence level of the un-linking process.  For example, a k-anonymous test could be leveraged to determine if the mean population resulting from a de-linking exercise meets an appropriate threshold (a high-bar k-anonymous threshold would be 1024).
>  
>  
>  

To be clear: k-anonymity is not at all the same as unlinkability, as defined above.
>  
> As there are many possible tests, it is recommended that companies publically stating W3C Tracking Preference compliance provide transparency to their delinking process so external experts and auditors can assess if they feel this steps are reasonable given the risk of a particular dataset.
>  
>  
>  

This is certainly a good practice—though it's not a requirement here.

I don't understand the role of the remaining provisions.  They don't appear to be normative requirements.  But they also don't quite make sense as non-normative guidelines.
>  
> ·         Information That Is Un-linkable When Collected:  A third party may collect non-protocol information if it is, independent of protocol information, un-linkable data. The data may be retained and used subject to the same limitations as protocol information.
>  
>   
>  
> Example: Example Advertising sets a language preference cookie that takes on few values and is shared by many users.
>  
>   
>  
> ·         Information That Is Un-linkable After Aggregation:  During the period in which a third party may use protocol information for any purpose, it may aggregate protocol information and un-linkable data into an un-linkable dataset. Such a dataset may be retained indefinitely and used for any purpose.
>  
>   
>  
> Example: Example Advertising maintains a dataset of how many times per week Italy-based users load an ad on Example News.
>  
>   
>  
> ·         Information That Is Un-linkable After Anonymization:  At some point after collection, a unique ID from a product cookie has a one-way salted hash applied to the identifier to break any connection between the resulting dataset and production identifiers.  To further remove dictionary attacks on this method, its recommended that "keys" are rotated on a regular basis.
>  
>  
>  
>  

This view of unlinkability is directly at odds with the definition above.  The very purpose of an ID (or hashed ID) is to attribute activities to a particular user, user agent, or device.
Received on Tuesday, 12 June 2012 05:09:58 UTC