Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49) from Jonathan Mayer on 2012-02-05 (public-tracking@w3.org from February 2012)

From: Jonathan Mayer <jmayer@stanford.edu>
Date: Sun, 5 Feb 2012 15:10:05 -0800
To: Sean Harvey <sharvey@google.com>
Cc: Matthias Schunter <mts@zurich.ibm.com>, Jeffrey Chester <jeff@democraticmedia.org>, "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-Id: <88AA7143-8685-4C20-AF37-1F7954B30D87@stanford.edu>
Let me add a note on this argument I've now heard from several Google representatives about how client-side privacy technologies might put a user at risk.  I believe that's wrong.  A local attacker is already trivially able to get the user's browsing history (from the browser itself).  A number of offensive software utilities already do just this.  A remote attacker would have to exploit a cross-site scripting vulnerability, which there are good techniques for preventing.

Jonathan


On Feb 5, 2012, at 2:54 PM, Jonathan Mayer wrote:

> My notions of "minimization" and "balancing" encompass consideration of alternatives to a blanket use-based exception.  There are infinite possible exceptions for any particular business purpose, with countless permutations of collection and retention limits.  Those limits could be as straightforward as a retention period; they could be as complex as a privacy-preserving alternative technology.
> 
> As for client-side frequency capping and other privacy-preserving web technologies: my lab is far from alone in developing these alternatives.  See the annotated bibliography on donottrack.us for some of the other work in the field.  These approaches are not mere lab studies; much of the finest work has been done by Microsoft Research, using data and technology from deployed systems.  I can't speak to what DoubleClick was capable of in 2007 and earlier, but I am very skeptical that these technologies are out of reach in 2012.
> 
> All of that said, let's take the position you (and others) have articulated at face value: client-side privacy-preserving technologies won't work.  Seeing as client-side storage is a fundamental component of just about *any* privacy-preserving system, then all we're left with are unique ID cookies.  The balance is, then, between frequency capping (where there is undoubtedly some economic value) and collection of a user's browsing activity across websites (the *central* concern in the Do Not Track debate for me and many others).  As you rightly noted in Brussels, taking the balance seriously, that means no frequency capping for DNT users.
> 
> And so, to circle back to privacy-preserving technologies: I am trying to extend an olive branch to the advertising industry representatives in the group.  I am trying to find ways for you to accomplish your business aims while giving user privacy the deference it deserves.  As between no frequency capping and an admittedly more challenging privacy-preserving frequency capping technology, I should imagine the latter is preferable.
> 
> Jonathan
> 
> 
> On Feb 5, 2012, at 1:57 PM, Sean Harvey wrote:
> 
>> I want to comment on Jonathan's original email on this chain, in the context of his later response below. Jonathan's thoughts are in general well thought out. To my mind the main stumbling block is his elaboration of #5, which was titled "Minimization" but focused on the use of "privacy enhancing alternatives". 
>> 
>> In light of our both our meeting in Brussels and Jonathan's later post to this email chain, it's clear that Jonathan is speaking of his own personal version of client-side frequency capping, and so I feel forced to address this issue, though it seems tangential to our goals. 
>> 
>> To put it simply, client-side frequency capping does not work at scale. 
>> 
>> There were two separate initiatives at DoubleClick prior to its acquisition by Google that attempted to move functionality like frequency capping onto the client-side. Both looked nice when you did a little demo of them. But none of the worked at scale across a system -- like the ones that will be most directly impacted by these discussions -- that transact tens of billions of events per day. Discussing in further detail would be inappropriate in the context of this list because of proprietary technology concerns, but suffice it to say that client-side frequency capping and other such ad serving capabilities crap out at scale. 
>> 
>> This is not to say that Jonathan is not extremely intelligent or that his idea isn't a good one, but he does not have the hard experience of many years spent building & maintaining massively scaleable software systems that must never go down at risk of the financial viability of tens of thousands of businesses across the web. And we do have many other women & men who are every bit as intelligent running our ad serving & other systems.
>> 
>> I am also unconvinced that retaining such data on the client side is a data privacy & security improvement for physical security reasons, because clients (e.g. browsers on laptops) are far more easily stolen than servers on data farms. While it's true that there would be no human readable values on the client side that an individual could leverage, the same remains true of the frequency cap ticks that are currently stored on the server-side. 
>> 
>> I think it is entirely valid & useful for us to discuss openly the merits of a frequency cap exception, but do not think it is legitimate for us to make potentially disastrous technical implementation requirements in the context of this W3C compliance process. 
>> 
>> sean
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sat, Feb 4, 2012 at 1:17 AM, Jonathan Mayer <jmayer@stanford.edu> wrote:
>> Here are a few exceptions that I believe could clear the hurdle.
>> 
>> -Content serving
>> -Contextual personalization
>> -Outsourcing
>> -Protocol logs for debugging
>> -Unidentifiable data (including aggregated data and client-side frequency capping)
>> -View fraud prevention through a stepped response
>> 
>> On Feb 2, 2012, at 7:06 AM, Matthias Schunter wrote:
>> 
>> > Hi Jonathan/Jeff,
>> >
>> > what exeptions do you see at this point that are likely to satisfy this
>> > catalogue?
>> > what are viable candidates where only  more data/input/answers is needed?
>> >
>> > Regards,
>> > matthias
>> >
>> >
>> >
>> >
>> > |------------>
>> > | From:      |
>> > |------------>
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> >  |Jeffrey Chester <jeff@democraticmedia.org>                                                                                               |
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> > |------------>
>> > | To:        |
>> > |------------>
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> >  |Jonathan Mayer <jmayer@stanford.edu>,                                                                                                    |
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> > |------------>
>> > | Cc:        |
>> > |------------>
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> >  |"public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>                                                               |
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> > |------------>
>> > | Date:      |
>> > |------------>
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> >  |02/02/2012 03:34 PM                                                                                                                      |
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> > |------------>
>> > | Subject:   |
>> > |------------>
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> >  |Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31,  ISSUE-34, ISSUE-49)                                                    |
>> >> -----------------------------------------------------------------------------------------------------------------------------------------|
>> >
>> >
>> >
>> >
>> >
>> > I agree with Jonathan's thoughtful discussion of the exemption issue.  I
>> > recognize this is a delicate matter, and it will require continued dialogue
>> > to properly balance the goal's of DNT with traditional digital marketing
>> > (and advertising generally) business practices.  I believe that if we
>> > follow Jonathan's outline, we can achieve our collective goals.
>> >
>> > Jeff
>> >
>> > On Feb 1, 2012, at 9:45 PM, Jonathan Mayer wrote:
>> >
>> >      The working group has made great progress on the broad contours of
>> >      the definition document, and the conversation is shifting to specific
>> >      exceptions.  With that in mind, now seems an appropriate time to
>> >      articulate my views on when and how exceptions should be granted.
>> >
>> >      At a high level, we all agree that exceptions reflect a delicate
>> >      balance between consumer privacy interests and commercial value.
>> >      There are, no doubt, substantial differences in opinion about where
>> >      that balance should be struck.  I hope here to clarify my approach
>> >      and help others understand why I find recent proposals for blanket
>> >      exceptions to be non-starters.
>> >
>> >      In my view, any exception must satisfy this rigorous six-part test.
>> >
>> >      1) Specifically defined.  An exception must clearly delineate what
>> >      data may be collected, retained, and used.  If a proposed exception
>> >      is purely use-based, that needs to be extraordinarily explicit.
>> >
>> >      2) No special treatment.  We should grant or deny an exception on the
>> >      merits of how it balances privacy and commerce, not a specific
>> >      business model.
>> >
>> >      3) Compelling business need.  A bald assertion that without a
>> >      specific exception Do Not Track will "break the Internet" is not
>> >      nearly enough.  I expect industry stakeholders to explain, with
>> >      specificity, what business purposes they need data for and why those
>> >      business purposes are extraordinarily valuable.
>> >
>> >      4) Significantly furthers the business need.  I expect industry
>> >      participants to explain exactly how and to what extent a proposed
>> >      exception will further the compelling business needs they have
>> >      identified.  In some cases cases, such as security and fraud
>> >      exceptions, this may call for technical briefing.
>> >
>> >      5) Strict minimization.  If there is a privacy-preserving technology
>> >      that has equivalent or nearly equivalent functionality, it must be
>> >      used, and the exception must be no broader than that technology.  The
>> >      burden is on industry to show that a privacy-preserving alternative
>> >      involves tradeoffs that fundamentally undermine its business needs.
>> >      In the context of frequency capping, for example, I need to hear why
>> >      - specifically - client-side storage approaches will not work.  In
>> >      the context of market research, to take another example, I would need
>> >      to hear why statistical inference from non-DNT users would be
>> >      insufficient.
>> >
>> >      6) Balancing.  There is a spectrum of possible exceptions for any
>> >      business need.  At one end is a pure use-based exception that allows
>> >      for all collection and retention.  At the other end is no exception
>> >      at all.  In between there are infinite combinations of collection,
>> >      retention, and use limits, including exceptions scoped to
>> >      privacy-preserving but inferior technologies.  In choosing among
>> >      these alternatives, I am guided by the magnitude of commercial need
>> >      and consumer privacy risk.  I am only willing to accept an exception
>> >      where the commercial need substantially outweighs consumer privacy
>> >      interests.
>> >
>> >      I understand example exceptions may be helpful in understanding my
>> >      thinking, so here are a few from the IETF Internet-Draft.
>> >
>> >              3. Data that is, with high confidence, not linkable to a
>> >            specific
>> >                  user or user agent.  This exception includes statistical
>> >                  aggregates of protocol logs, such as pageview statistics,
>> >            so long
>> >                  as the aggregator takes reasonable steps to ensure the
>> >            data does
>> >                  not reveal information about individual users, user
>> >            agents,
>> >                  devices, or log records.  It also includes highly
>> >            non-unique data
>> >                  stored in the user agent, such as cookies used for
>> >            advertising
>> >                  frequency capping or sequencing.  This exception does not
>> >            include
>> >                  anonymized data, which recent work has shown to be often
>> >            re-
>> >                  identifiable (see [Narayanan09] and [Narayanan08]).
>> >              4. Protocol logs, not aggregated across first parties, and
>> >            subject
>> >                  to a two week retention period.
>> >              5. Protocol logs used solely for advertising fraud detection,
>> >            and
>> >                  subject to a one month retention period.
>> >              6. Protocol logs used solely for security purposes such as
>> >            intrusion
>> >                  detection and forensics, and subject to a six month
>> >            retention
>> >                  period.
>> >              7. Protocol logs used solely for financial fraud detection,
>> >            and
>> >                  subject to a six month retention period.
>> >
>> >
>> >      I would add, in closing, that in difficult cases I would err on the
>> >      side of not granting an exception.  The exemption API is a policy
>> >      safety valve: If we are too stringent, a third party can ask for a
>> >      user's consent.  If we are too lax, users are left with no recourse.
>> >
>> >      Best,
>> >      Jonathan
>> >
>> >
>> >
>> >
>> >
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Sean Harvey
>> Business Product Manager
>> Google, Inc. 
>> 212-381-5330
>> sharvey@google.com
>
Received on Sunday, 5 February 2012 23:10:43 UTC