Re: Industry Amendment Clarifications as Promised on July 10 W3C WG Weekly Call from Jonathan Mayer on 2013-07-13 (public-tracking@w3.org from July 2013)

From: Jonathan Mayer <jmayer@stanford.edu>
Date: Sat, 13 Jul 2013 12:22:26 -0700
To: Justin Brookman <jbrookman@cdt.org>
Cc: Jack Hobaugh <jack@networkadvertising.org>, public-tracking@w3.org
Message-ID: <AA5C8039F0C54ED7B024DC520198BB72@gmail.com>
Let me start by expressing my disappointment: The co-chairs are entertaining trade group "clarifications" that arrived just two hours before our deadline, long after many (if not most) commenters had submitted.  If these were mere notes on ambiguities, that would be irksome enough.  But these are much more: these are substantive revisions of the text.  Amendment #5, for example, would add an *entirely new passage* exempting audience measurement practices.

According to the co-chairs, these proposals "will be treated as a clarifying email to the list, but not as the text that the process will use as the DAA Proposal."  What are we to make of this mealy-mouthed rule of decision?  The revisions will not be considered… except to the extent that they are considered?

Thankfully, we need not dwell on this latest instance of flagrant disregard for procedural legitimacy.  I agree with Justin: these amendments do not remedy the substantive deficiencies of the DAA proposal.

In the interest of sparing the group's time, I will not rehash Justin's critique, and add only a few notes on the new "deidentification" and "tracking" texts.

Beginning with "deidentification":
How does the "or" work in the new definition of "deidentification"?  Is it attached to the immediate "cannot reasonably be re-associated," or to the earlier "reasonable steps"?  Put differently, does the text identify two reidentification attacks that both must be defended against, or two alternative methods for deidentifying data?
What is a "disproportionate amount of time, expense, and effort" for reidentification?  How does that align against "reasonable steps to ensure that [the specified data] cannot be re-associated"?  How does it differ from the "delinked" provision of "reasonable steps to ensure that data cannot be reverse engineered back to identifiable data"?
How does the double-negative "without" clause work?  Does it mean that data is adequately scrubbed if a website has "controls" on *some* internal data  (i.e. data that could be controlled) that would be sufficient for identification?  Controls on *all* such internal data?  What about when public information is sufficient for identification?  Suppose, for example, that a dataset includes a fanciful Twitter username.  Trivially checking the Twitter website would give a full name, location, homepage, and more.  Is that data covered under the clause, since it does not include any internal data?
What is "the use of additional data"?  Take the Twitter website visit above—is that a use of additional data?  Put differently, what is the end state at which data is fully identified or identifiable?
What constitutes "separate and distinct technical and organizational controls"?  Suppose, for example, a website has a provision in its employee handbook about not identifying data—is that a sufficient control?  What about a periodic click-through reminder?

As for "tracking," as Justin notes, the new text is entirely non-normative.  It does nothing to address criticisms about continued profiling of users.  It also does nothing to clarify when aggregation is sufficient.  In fact, it explicitly punts: "[i]t is contemplated there will be significant discussion about what activities constitute Tracking."

In sum, the new text raises more questions than it answers on "deidentification" and "tracking."

Jonathan


On Saturday, July 13, 2013 at 7:06 AM, Justin Brookman wrote:  
> As the Chair has indicated that the latest trade association amendments will be considered as party of the decision process, I will briefly summarize why these amendments do not address my core concerns.   
>  
> Setting aside the fundamental problem with allowing cross-site personalization when DNT is turned on, the DAA standard, as re-amended, still will not meaningfully limit collection.  The proposed language on scoring is non-normative and suggestive, and the drafters are unwilling to alter the operative text: "Tracking is the collection and retention, or use of a user's browsing activity --- the domains or URLs visited across non-affiliated websites -- linked to a specific user, computer, or device."  While this language is difficult to parse, I believe the drafters have explained it to mean that the precise *website* you visited cannot be tracked, but any information about the *content* of that site is fair game.  In practice, I do believe that most ad networks will engage in scoring, in addition to maintaining .url logs for security and other permitted uses.  In fact, I think this is what most ad networks *already do today* for non-DNT/non-opt-out users.  In fact, Shane acknowledged "[t]his [scoring] would be similar to today's current interest-based advertising practices."  As specific product retargeting is explicitly envisioned as permissible under the DAA proposal, I do not believe that much "bucket-scrubbing" is envisioned as necessary.  I welcome further work on tracking hygiene, but I think those efforts should be delinked from Do Not Track.
>  
> I do appreciate the improvements to the language on deidentification, as I think the new language more closely tracks what is intended.  However, my previously stated policy objections remain.  I also appreciate the drafters adding the language "when such attribution would require a disproportionate amount of time, expense, and effort."  Unfortunately, this condition is disjunctive in the text, and so does not provide any marginal value, as data may still reach the deidentified state through the operational controls that I previously objected to.  Indeed, the amendment arguably highlights that reidentifying data deidentified by operational controls could be accomplished *without* a disproportionate amount of time, expense, and effort.
>  
> My position on a market research permitted use is well-documented, and I will not belabor the point on a lovely Saturday morning.  And with that, I am off to coach T-ball.
>  
> Jack Hobaugh , 7/12/2013 5:58 PM:
> > Dear Colleagues:
> >  
> > As promised on the July 10 W3C WG call, Industry presents the following modifications and new amendments to provide the requested clarifications regarding these areas.
> >  
> >  
> > Amendment # 1:
> >  
> >  
> > For the purposes of this specification, dData is deidentified when a party:
> >  
> >  
> > 1.  has taken reasonable steps to ensure that the URL data across websites or Unique ID cannot reasonably be re-associated or connected to a specific user, computer, or device without the use of additional data that is subject to separate and distinct technical and organizational controls to ensure such non-attribution, or when such attribution would require a disproportionate amount of time, expense, and effort;
> >  
> >  
> > 2.  has taken reasonable steps to protect the non-identifiable nature of data if it is distributed to non-affiliates third parties and obtain satisfactory written assurance that such entities third parties will not attempt to reconstruct the data in a way such that an individual may be re-identified and will use or disclose the de-identified data only for uses as specified by the entity original party.
> >  
> >  
> > 3.  has taken reasonable steps to ensure that any non-affiliate third party that receives de-identified data will itself ensure that any further non-affiliate third parties entities to which such data is disclosed agree to the same restrictions and conditions.
> >  
> >  
> > 4.  will commit to not purposely sharing this deidentified data publicly.
> >  
> >  
> > Non-normative text: The commitment to not purposely share deidentified data does not include reports on deidentified data.
> >  
> >  
> > Data is delinked when a party:
> >  
> >  
> > 1. has achieved a reasonable level of justified confidence that data has been de-identified and cannot be internally linked to a specific user, computer, or other device within a reasonable timeframe;
> >  
> >  
> > 2. has taken reasonable steps to ensure that data cannot be reverse engineered back to identifiable data without the need for operational or administrative controls.
> >  
> >  
> > Amendment # 4 (new):
> >  
> >  
> > [Section 5 paragraph 3]
> >  
> >  
> > Outside the permitted uses, or de-identification, or uses not included within the definition of “Tracking,” the third party MUST NOT collect, retain, or share network interaction identifiers data that identify the specific user, computer, or device.  
> >  
> >  
> > Amendment # 5 (new):
> >  
> >  
> > The industry supports adding the audience measurement language that has been discussed and revised with several participants and submitted by Esomar to the permitted uses section, 5.2.
> >  
> >  
> > Amendment # 6 (new):
> >  
> >  
> > Non-normative language for the definition of “Tracking”:
> >  
> >  
> > <non-normative>
> >  
> >  
> > It is contemplated there will be significant discussion about what activities constitute Tracking and its inverse, Not Tracking.  This text explores some of the possible areas that could be considered the latter, Not Tracking.  
> >  
> >  
> > The operative section of the Tracking definition is the linkage between unique identifiers (users/devices) and activity across non-affiliated web sites (for discussion sake, this will be referred to as URLs but it should be clear more than URLs may qualify as activity).  To achieve “Not Tracking” one could conceive of methods that appropriately separate these two dimensions of Tracking: Unique IDs and URLs.  
> >  
> >  
> > One possible method could be called Aggregate Scoring.  In Aggregate Scoring, the goal is to retain the Unique ID and aggregate away associated URLs, replacing them with an aggregate interest score - something that cannot be reverse engineered back to the original URL.  For example, Cookie ID 123456789ABCD views http://www.ford.com/2013/trucks/F-150?uid=123 could be aggregated to an interest score associated with the Cookie ID becoming “Cookie ID 123456789ABCD has an interest score of 4 in Offline Vehicles”.
> >  
> >  
> > It is difficult to provide prescriptive measures of what would constitute “enough” aggregation or other processing to ensure the user’s browsing history cannot be reverse engineered from the retained data. However, some examples could include, but are not limited to, using minimum numbers of URLs that would constitute an aggregate and/or look at establishing a minimum number of users qualifying for a particular aggregate score before that score is exercised in production.  Other approaches are possible, however, to reach the desired end result, which is non-retention of users’ browsing history.
> >  
> >  
> > Further, it is strongly suggested organizations provide users transparency into these activities and provide control options for users to disallow activities such as Aggregate Scoring if the user so desires.
> >  
> >  
> > </non-normative>
> >  
> >  
> > Best regards,
> >  
> > Jack
> >  
> > Jack L. Hobaugh Jr
> > Network Advertising Initiative | Counsel & Senior Director of Technology  
> > 1634 Eye St. NW, Suite 750 Washington, DC 20006
> > P: 202-347-5341 | jack@networkadvertising.org (mailto:jack@networkadvertising.org)
> >  
> >  
> >  
> >  
> >
Received on Saturday, 13 July 2013 19:22:59 UTC