RE: Updated Proposal - Outline in preparation for presentation in Seattle from Shane Wiley on 2012-06-12 (public-tracking@w3.org from June 2012)

From: Shane Wiley <wileys@yahoo-inc.com>
Date: Tue, 12 Jun 2012 03:21:49 -0700
To: Jonathan Mayer <jmayer@stanford.edu>
CC: "public-tracking@w3.org" <public-tracking@w3.org>
Message-ID: <63294A1959410048A33AEE161379C8023D1878628A@SP2-EX07VS02.ds.corp.yahoo.com>
Jonathan,

Thank you for the single suggested correction in the “Unlinkability” section although I believe it can be read in either state of the term and still be correctly interpreted.

In short:  I believe the proposal would better be characterized as “Do Not Profile or Target + Data Minimization + Transparency” – or better yet “The Proposal That Can Be Implemented at Scale and in Mass”.  Whereas your proposal could be characterized as “Unlinkability + Limited Security” – or better yet “The Proposal That Only a Few Would Ever Implement”.

I’ve responded to your comments below in [ ]:

- Shane

From: Jonathan Mayer [mailto:jmayer@stanford.edu]
Sent: Tuesday, June 12, 2012 1:09 AM
To: Shane Wiley
Cc: public-tracking@w3.org
Subject: Re: Updated Proposal - Outline in preparation for presentation in Seattle

Some corrections and responses below.

In short: Do Not Target + Transparency.  As best I can tell, this is the very same proposal we heard in DC with an added transparency requirement for retention periods.

Jonathan

On Monday, June 11, 2012 at 7:25 PM, Shane Wiley wrote:

Hello TPWG,



Due to “recent activities” I’m a bit behind on providing the final presentation for our updated proposal in preparation for Seattle.  We’ll be reviewing this in more detail in Seattle but I wanted to share some of the initial elements up-front so we have time as a working group to begin discussion and consider perspectives leading up to the meeting.



------



Goal:  Evolve DC proposal to bridge the divide with the advocate proposal and set a final recommendation for these elements



Definition of First Party

Advocate Position:  Common Branding
The preferred position I've heard is user expectations, not branding.  Many view branding as a compromise.

[Disagree – this is where your recommendation ended at the DC meeting.]

I also don't believe it's accurate to characterize the opposing viewpoint as simply "advocate."  Representatives from a number of companies (e.g. Mozilla and Apple), policymaking organizations (e.g. the Federal Trade Commission and Article 29 Working Party), and research institutions have signaled support for some of these views.  For lack of a better phrase, I've been calling it the pro-privacy position.  (Trite, I know.  Alternative suggestions welcome!)

[Our proposal also embodies elements from regulators as well.  I believe it’s already overly giving to call your work “advocacy” and that the true “pro-privacy” proposal is the one offered by those that actually work with 100s of millions of users every day.  I’ll stick with Advocate for now.]


Industry Position:  Affiliate

Concession Proposal:  Affiliate with “easy discoverability” (“Affiliate List” within one click from each page or owner clearly identified within one click from each page.  For example, a link in the privacy policy would meet this requirement.)
It looks like the proposals are closely aligned on this issue.  See http://jonathanmayer.github.com/dnt-compromise/compromise-proposal.html#parties.


The new industry position, to be sure, reflects a marginal transparency concession.  But it's beyond peradventure that the pro-privacy participants are giving up far more by allowing affiliate information sharing.  Affiliate information flows deviate significantly from user expectations and have been frequently abused in the context of other privacy regulation.

Rough ballpark figure: On party size, industry participants get 99% of what they want, pro-privacy participants get 1% of what they want.

[Strongly disagree with the percentages but happy we’ve landed in nearly the same place on this one.]


Permitted Uses

Advocate Position:  Unlinkable Data w/ arbitrary “grace period”
As I explained in the DC meeting and again in an email yesterday, the pro-privacy position would facilitate "operational uses" in a variety of ways.  See http://lists.w3.org/Archives/Public/public-tracking/2012Jun/0220.html.


[The Advocate position doesn’t really allow for operational uses and as you’ve noted breaks many current business operations, requires significant reengineering, and provides unproven and untested at scale thought exploration (may work inside a Lab with very small companies though).]


Industry Position:  Enumerated uses, broadly scoped, general data minimization

Concession Proposal:  Tightened up permitted uses, narrowly and strictly scoped, data minimization focus with required transparency, reasonable safeguards, defined unlinkable (highlighting this moves resulting data outside of scope)
Discussion below.

For All Permitted Uses

What won’t occur:  Outside of Security, all other permitted uses will not allow for altering a specific user’s online experience (no profiling, no further alteration to the user experience base on profiled information)
Advertising industry self-regulation has required this since mid-2000.  See http://www.ftc.gov/os/2000/07/NAI%207-10%20Final.pdf.


[Wrong – self-regulation halts targeting based on a profile – this proposal goes one step further to halt the profiling itself.]


Data Minimization:  Each organization engaging in Permitted Uses and claiming W3C DNT compliance, must provide public transparency of their data retention period (may enumerate each individually if they vary across Permitted Uses)
As with corporate affiliation, there is only a marginal transparency concession here.  I can't quite tell—there may also be a nearly-unenforceable substantive concession (sometimes termed "reasonable minimization").

[Disagree on scale and believe this provides users with the information necessary to decide if they want to engage with a particular company.]


Reasonable Safeguards:  Reasonable technical and organizational safeguards to prevent further processing:  collection limitations, data siloing, authorization restrictions, k-anonymity, unlinkability, retention time, anonymization, pseudonymization, and/or data encryption.
Once again, advertising industry self-regulation has imposed this requirement for over a decade.

[Agree to some degree but it’s important to highlight that this is a vital element of our proposal to minimize risk to non-compliant use of user data that carries valid DNT signals.]


Permitted Uses:  Security/Fraud, Financial Logging/Auditing, Frequency Capping, Debugging, Aggregate Reporting*
These permitted uses are unchanged from current self-regulatory commitments.

[Disagree – please read again.]


For each Permitted Use:

(Normative) Detailed, singular business purpose description

(Non-normative) Will explain why the processing with identifiers is proportionate
More marginal transparency concessions.

[Transparency is key to user understanding.]


*NOTE – Aggregate Reporting covers general analytics needs, product improvement, and market research uses
Rough ballpark figure: On "operational uses," industry participants get 99% of what they want, pro-privacy participants get 1% of what they want.

[This goes to the heart of “Unlinkability” and captures the essence of your own proposal.  Interesting that you’d attack this one at all.  Also, could you please break-down your mathematical analysis to arrive at Advocates only getting 1%?  If you’re going to be using percentages please be ready to stand by them and walk us through your detailed thinking.]


Explicit and Separate User Choice

User must expressly activate DNT signal (TPWG already agreed on this point)
No compromise here, of course.  This is the current industry position.  See https://www.aboutads.info/resource/download/DAA_Commitment.pdf.


[No compromise needed as the entire TPWG save one person, you, agreed on this position.  Please go back to this last week’s meeting notes when they’re available and you’ll see the same outcome was arrived at a second time.]


Servers may respond to users that their UA is “invalid” if they believe this to be the case (on the hook to defend this position)
Another issue with no deviation from the industry position: websites get to ignore DNT if a browser might have set it by default.

[Correct – this is the industry position.]


Efforts to misled users to activate DNT will be seen as “invalid”
I assume this is intended to be a user interface requirement.  Again no compromise.

[Not a specific UI requirement as in the placement of a button but more to the general approach a UI may take.]


With this Proposal

Users gain a consistent, local tool to communicate their opt-out preference (avoids property specific opt-out pages)

The users choice is persistent for each device/UA (avoids accidental deletion)
Yes, the Do Not Track technology is obviously superior to the old opt-out cookie technology.  I'm not quite certain how that's relevant here, though.

[How could you argue this isn’t relevant?  This is what started the debate that created this working group.  It’s one of the key reasons DNT is being discussed today and is a significant improvement for user privacy.]


Outside of Security purposes, the user will no longer experience alterations to their online experiences derived from multi-site activity
In other words, Do Not Target.

[Or rather, Do Not Profile or Target.  Business uses are reduced to only those required for minimal business operations.]


Only minimal data is retained for necessary business operations and retention periods are transparent to users
See the discussion of transparency and minimization requirements above.


All “harms” are removed (outside of government intrusion risk where there are no documented cases of this occurring with 3rd party anonymous log file data)
The group exercise in DC emphasized how privacy risks go far beyond behavioral personalization (if that's even a privacy risk...).

[Disagree – go back to those lists and look at what “REAL” harms remain for users after this proposal has been implemented.]


Unlinkability



<Normative>



Un-linkable Data is outside of the scope of the Tracking Preference standard as information is no longer reasonably linked to a particular user, user agent, or device.
I presume the meaning is "linkable," not "linked," given the following definition.

[I believe “linked” is more appropriate here as it’s an active phrasing of the requirement – whereas “-ability” or “-able” captures the capability to meet the active outcome.]

Definition:  A dataset is un-linkable when reasonable steps have been taken to modify data such that there is confidence that it contains only information which could not be linked to a particular user, user agent, or device.



<Non-Normative>



There are many valid and technically appropriate methods to de-identify or render a data set "un-linkable".  In all cases, there should be confidence the information is unable to be reverse engineering back to a "linkable" state.  Many tests could be applied to help determine the confidence level of the un-linking process.  For example, a k-anonymous test could be leveraged to determine if the mean population resulting from a de-linking exercise meets an appropriate threshold (a high-bar k-anonymous threshold would be 1024).
To be clear: k-anonymity is not at all the same as unlinkability, as defined above.

[To be clearer, k-anonymity renders a dataset “unlinkable”.]

As there are many possible tests, it is recommended that companies publically stating W3C Tracking Preference compliance provide transparency to their delinking process so external experts and auditors can assess if they feel this steps are reasonable given the risk of a particular dataset.
This is certainly a good practice—though it's not a requirement here.

[Yes – the entire section is marked “Non-Normative” hence the use of terms like “recommended”.]

I don't understand the role of the remaining provisions.  They don't appear to be normative requirements.  But they also don't quite make sense as non-normative guidelines.

[They are examples of the non-normative recommendation – not sure how you would capture these otherwise.]

1. Information That Is Un-linkable When Collected:  A third party may collect non-protocol information if it is, independent of protocol information, un-linkable data. The data may be retained and used subject to the same limitations as protocol information.



Example: Example Advertising sets a language preference cookie that takes on few values and is shared by many users.



2. Information That Is Un-linkable After Aggregation:  During the period in which a third party may use protocol information for any purpose, it may aggregate protocol information and un-linkable data into an un-linkable dataset. Such a dataset may be retained indefinitely and used for any purpose.



Example: Example Advertising maintains a dataset of how many times per week Italy-based users load an ad on Example News.



3. Information That Is Un-linkable After Anonymization:  At some point after collection, a unique ID from a product cookie has a one-way salted hash applied to the identifier to break any connection between the resulting dataset and production identifiers.  To further remove dictionary attacks on this method, its recommended that "keys" are rotated on a regular basis.
This view of unlinkability is directly at odds with the definition above.  The very purpose of an ID (or hashed ID) is to attribute activities to a particular user, user agent, or device.

[Disagree – an ID that has been appropriately “disconnected” from a production environment allows for limited use but doesn’t allow “linkage” to a particular user or device in the real-world.]
Received on Tuesday, 12 June 2012 10:26:23 UTC