- From: Lou Mastria <lou@aboutads.info>
- Date: Fri, 12 Jul 2013 17:10:32 -0400
- To: Nicholas Doty <npdoty@w3.org>, public-tracking@w3.org
Objections to Editors' Draft The Digital Advertising Alliance (DAA) is gratified to be part of a consensus solution regarding what constitutes effective consumer privacy protection, in a transparent, forthright manner, as a participant in the Tracking Protection Working Group. DAA believes that any next steps in the process must incorporate technological specifications in "do not track" signals in a manner that allows the Internet to remain innovative, open and competitive to all, and facilitates programmatic support for privacy where the consumer, himself or herself, is in control. We thank the World Wide Web Consortium (W3C) for the opportunity to provide comment on the current draft - and we will direct our comments on where we believe real consensus can be achieved, and where it cannot. We embrace W3C's mission to facilitate the functionality of the Internet - and our bias is toward enabling responsible information use of the Internet for marketing purposes in a manner that affords to consumers both transparency and choice in how data regarding their own Internet experiences are collected and used. The reality is that DNT has become conflated with a misperception that the Internet can function without data being shared. There is a reality, too, that we believe all of the Group agrees upon which says that this is not the case - that this is simply hype. That being the case, the more prudent approach is not to focus on data collection, but rather on responsible, transparent and enforceable data use. Transparency, user choice and use limitations are hallmarks of good privacy design. The industry consensus proposal appropriately focuses on these core elements. The Editors' Draft does not. DAA Representation Reflects a Broad Spectrum of Digital Display Advertising Ecosystem DAA represents a broad coalition of companies and organizations that serve interest-based advertising (IBA) - also known as online behavioral advertising, a thriving component of the Internet economy that enables relevant advertising served to Internet users based, in part, on recent page visits. IBA delivers 2X cpm to publishers (big and small) of Internet content and services and consumers click on it 2X as much as more generic forms of online ads - a true win-win. Such advertising helps to finance a diverse Internet experience based on a device's online activity, and in a largely de-identified manner. Consumers can opt out of such activity by visiting DAA's Consumer Choice page - accessible in the United States through www.aboutads.info and www.youradchoices.com, and replicated in similar DAA-partner sites in 30 nations worldwide. Consumers are presented this choice currently at a rate of 1 trillion times per month with just-in-time notice. Further, the DAA Principles which guide this program specify restrictions on acceptable information use of online data, security, and, importantly, independent enforcement. The Principles also apply to the mobile Web, and will soon include guidance for mobile applications as well. Overall, the program has been commended by the US White House, and achieved acknowledged support from regulators in the US and the EU. (One ministerial clarification.while Option A has come to be called the DAA Proposal, it is important to note that this consensus approach was developed and submitted by a an entire cross-section of responsible industry entities who all seek to provide a pragmatic way forward that achieves real privacy protections while continuing to support the ad-funded Internet we've all come to love.) The Editors' Draft Needs Further Refinement to Protect Internet Diversity in Content, Programmatic Support for Self-Regulation Worldwide, and Real Privacy Protection for Consumers DAA directs its remarks to three specific areas of concern in the Editors' Draft. We understand that other organizations represented among DAA's leadership - 4A's, American Advertising Federation, Association of National Advertisers, Direct Marketing Association, Interactive Advertising Bureau, Network Advertising Initiative - that are also TPWG participants may have additional comments of their own, to which DAA also subscribes. 1. The Editors' Draft appears to overreach by attempting to eliminate a business model (interest-based ads) as a way to achieve a privacy goal. The reality is that these two things are NOT mutually exclusive.and, with the implementation of the DAA Principles, they represent a quantum leap in privacy protection for consumers from the old days of notice exclusively living in privacy policies. Furthermore, we believe that proposals, such as browsing history aggregation scoring and de-identification, demonstrate that ad customization can be achieved in a privacy-friendly way, and that it may be possible to have a well-balanced and tailored approach that advances privacy while preserving competition, continuing to help ad-funded content and gaining a high rate of adoption. Consumers are pragmatic toward relevant advertising - and understand the value exchange that comes from having diverse content on the Internet that is ad-funded. A recent (April 2013) Zogby poll, commissioned by DAA, shows that: . 92 percent of Americans think free content like news, weather and blogs is important to the overall value of the Internet . 75 percent prefer ad supported content to paying for ad-free content . 68 percent prefer to get at least some ads Internet directed at their interests . 75 percent prefer to make their decisions about relevant advertisements - not rely on decisions of governments of browser makers . 40 percent prefer to get all their ads directed to their interests Such expectations of consumers need to be reflected and respected in the TPWG process. Notice, choice and default do-not-track settings and mechanisms need to remain in consumers' hands, while enabling the Internet to function with ad-supported content, with a bias toward the widest diversity of content, the widest of diversity of sites and the widest diversity of advertisers to enable such experiences for the consumer. 2. A balanced and narrowly tailored approach that solves specific privacy concerns while maintaining competition and a diverse Internet economy is much more likely to gain widespread adoption, and ultimately benefit consumers with a net privacy gain through better hygiene, de-linking and de-identification of data. The Editors' Draft, even though conceived after the May face to face meetings, did not include this type of balanced provision. That is unfortunate and yet another reason why the industry consensus proposal remains the best approach moving forward. Furthermore, we already have a successful model in place for enforcing data use compliance through the DAA. DAA would call on its program participants to comply with a specification that met the twin goals of advancing privacy and advancing the ad-funded internet. To date, 100-percent of enforcement actions by DAA, among them 19 by the Council of Better Business Bureaus, have been resolved successfully. That is effective enforcement with teeth. 3. The Editors' Draft recommendation does not consider the "tri-state approach" regarding data collection and permissible uses reflected in the May 6-8, 2013, "consensus action summary." This proposal can provide heightened privacy protections while still allowing an IBA-funded internet. A study by Professor Catherine Tucker of MIT Sloan School of Management, regarding the EU e-Privacy Directive, shows that ad performance suffers when interest-based ads are disallowed: ad performance drops off by 65 percent. Barring interest-based ads would institutionalize such advertising underperformance while affording no real consumer privacy protection interest. Balance must be achieved between consumer privacy and a thriving, diverse and free Internet - and the "tri-state approach" enables this mutually-beneficial outcome. Again, we thank W3C and TPWG for its efforts to balance all interests - but to do so in a manner that protects overall Internet functionality, productivity and diversity in a manner that supports consumer privacy protection in its specifications while providing the ability for legitimate, responsible and transparent business models that continue to support ad-funded content. -----Original Message----- From: Nicholas Doty [mailto:npdoty@w3.org] Sent: Friday, July 12, 2013 3:59 PM To: public-tracking@w3.org (public-tracking@w3.org) Cc: Jonathan Mayer Subject: Re: what base text to use (was re: data hygiene approach / tracking of URL data) [for jmayer] [Text is from Jonathan Mayer; sending to the list and submitting web form in order to make public and avoid questionnaire submission problems. -npdoty] Objections to Option A: 1) Exclusions from "tracking" are textually limitless and allow for user profiling. In the amended DAA proposal, "tracking" is scoped to "the domains or URLs visited across non-affiliated websites." Data that is not considered "tracking" would be exempt from use limitations, collection minimization, retention transparency, and even reasonable security. Records of the following sort would be covered as tracking. Cookie ID | URL | Time -------------------------------------------------------------------------- ---------------- 123 | http://www.webmd.com/hiv-aids/default.htm | 7/11/13 (4:10pm PST) 123 | http://taxes.about.com/od/backtaxes/Back_Taxes.htm | 7/11/13 (4:13pm PST) 123 | http://sanfrancisco.gaycities.com/bars/ | 7/11/13 (4:15pm PST) 123 | http://www.wikihow.com/Quit-a-Job | 7/11/13 (4:19pm PST) Cookie ID | Name | Email | Address | ZIP -------------------------------------------------------------------------- ---------------- 123 | Jonathan Mayer | jmayer@stanford.edu | 353 Serra Mall | 94305 But what about records like these, where the URLs have been modified by ROT13 and can be trivially recovered? Cookie ID | URL | Time -------------------------------------------------------------------------- ---------------- 123 | uggc://jjj.jrozq.pbz/uvi-nvqf/qrsnhyg.ugz | 7/11/13 (4:10pm PST) 123 | uggc://gnkrf.nobhg.pbz/bq/onpxgnkrf/Onpx_Gnkrf.ugz | 7/11/13 (4:13pm PST) 123 | uggc://fnasenapvfpb.tnlpvgvrf.pbz/onef/ | 7/11/13 (4:15pm PST) 123 | uggc://jjj.jvxvubj.pbz/Dhvg-n-Wbo | 7/11/13 (4:19pm PST) Cookie ID | Name | Email | Address | ZIP -------------------------------------------------------------------------- ---------------- 123 | Jonathan Mayer | jmayer@stanford.edu | 353 Serra Mall | 94305 Or records like these, where the URLs have been grouped, such that the user went to one of the first pair of URLs and one of the second pair of URLs?* Cookie ID | URL | Group -------------------------------------------------------------------------- ---------------- 123 | http://www.webmd.com/hiv-aids/default.htm | 1 123 | http://www.nytimes.com/ | 1 123 | http://www.mayoclinic.com/health/hiv-aids/DS00005/DSECTION=symptoms | 2 123 | http://www.washingtonpost.com/ | 2 Cookie ID | Name | Email | Address | ZIP -------------------------------------------------------------------------- ---------------- 123 | Jonathan Mayer | jmayer@stanford.edu | 353 Serra Mall | 94305 Or records like these, where the URL has been reduced to a set of features? Cookie ID | Webpage Features | Time -------------------------------------------------------------------------- ---------------- 123 | Health, Self-Help, HIV/AIDS | 7/11/13 (4:10pm PST) 123 | Finance, Self-Help, Taxes, Back Taxes | 7/11/13 (4:13pm PST) 123 | San Francisco, Gay, Drinking, Gay Bars | 7/11/13 (4:15pm PST) 123 | Employment, Self-Help, Quitting, Job Hunting | 7/11/13 (4:19pm PST) Cookie ID | Name | Email | Address | ZIP -------------------------------------------------------------------------- ---------------- 123 | Jonathan Mayer | jmayer@stanford.edu | 353 Serra Mall | 94305 The plain text of the DAA proposal would allow for all three of these practices.** It does not define when URL data has been sufficiently altered to no longer constitute tracking. Moreover, even supposing the DAA proposal were amended to require rigorous aggregation of website features, it would remain problematic for privacy. The DAA design misses the forest for the trees: There is nothing *inherently* problematic about URL data. Rather, privacy risks flow from *what can be learned from* URL data. Consider the following records, which include only highly aggregated interest segments. Assume there is no reasonable way of mapping the data to URLs. Cookie ID | Interest Segment -------------------------------------------------------------------------- ---------------- 123 | HIV/AIDS 123 | Back Taxes 123 | Gay Bars 123 | Quitting Employment Cookie ID | Name | Email | Address | ZIP -------------------------------------------------------------------------- ---------------- 123 | Jonathan Mayer | jmayer@stanford.edu | 353 Serra Mall | 94305 Under the DAA proposal, Do Not Track would allow a website to compile this sort of detailed dossier on a consumer-and keep it indefinitely, use it for any purpose, without transparency, and without security. We would be greatly deviating from both consumer expectations*** and policymaker preferences. * For yet another related example, consider an implementation where each URL is assigned an independent probability that the user visited of < 0.5. ** Oddly, one provision of the proposal would seem to prohibit any use of unique identifiers save for the "deidentified" and "permitted uses" exceptions. > Outside the permitted uses or de-identification, the third party MUST > NOT collect, retain, or share network interaction identifiers that > identify the specific user, computer, or device. My understanding is that this passage is to be interpreted as a drafting error. *** See, for example: http://www.consumer-action.org/downloads/english/DNT_survey_PR.pdf http://ssrn.com/abstract=2152135 http://pewinternet.org/Reports/2012/Search-Engine-Use-2012.aspx https://www.eff.org/sites/default/files/TRUSTe-2011-Consumer-Behavioral-Ad vertising-Survey-Results.pdf http://gallup.com/poll/File/145334/Internet_Ads_Dec_21_2010.pdf http://aleecia.com/authors-drafts/tprc-behav-AV.pdf http://ssrn.com/abstract=1478214 2) The deidentification scheme is textually undefined, and Yahoo!'s proposal fails to rigorously protect consumer privacy. Like non-tracking data, deidentified data is *entirely* exempt from use limitations, collection minimization, retention transparency, and reasonable security. In exchange for this extraordinary reduction in information practice contraints, one would expect deidentified data to be rigorously privacy-protective. By that yardstick, the DAA proposal falls far short. The textual "deidentified" and "delinked" definitions are unworkably vague and self-contradictory. If data "cannot reasonably be re-associated or connected to a specific user," then how can it still be "internally linked to a specific user"? How is this data capable of being "reverse engineered back to identifiable data"? Why are "satisfactory written assurance[s]" required when this data is shared? Why can't this data be "purposely shar[ed] . . . publicly"? The DAA proposal provides no non-normative guidance to cut through this definitional fog. What's more, the one purportedly compliant implementation that we have heard-Yahoo!'s red-yellow-green proposal-provides little privacy protection. In a mid-2011 blog post,* Arvind Narayanan provided a taxonomy of various ways in which pseudonymous tracking data might be identified, including information leakage and deanonymization. Replacing one unique identifier with another does *nothing* to mitigate these privacy risks: a website would still retain an identifiable browsing history.** In addition, it will often be trivial to reconnect a pair of "red" and "yellow" unique identifiers. For example: i) Guess the mapping algorithm (e.g. a hashing algorithm with no salt or predictable salt). ii) Know the mapping algorithm (e.g. a known hashing algorithm and salt). iii) Have access to a black-box implementation of the mapping algorithm (e.g. be able to input one unique identifier, get the other). iv) Use deanonymization techniques to link the identifiers based on associated data. Any privacy gain would necessarily depend on controlled access to both the deidentification system and various datasets. Put differently, the Yahoo! proposal reduces to mere "operational or administrative controls." If the NSA can't get those right, how are consumers supposed to trust, say, an analytics startup? * If it would assist the co-chairs in their decision making, I would be glad to produce an example reidentification on data that has been deidentified under the Yahoo! proposal. ** https://cyberlaw.stanford.edu/blog/2011/07/there-no-such-thing-anonymous-o nline-tracking 3) Websites have no obligation to adopt privacy-preserving technologies for permitted uses. The DAA proposal omits any reference to privacy-preserving technologies. Where an alternative to present practices is available and accommodates consumer privacy concerns, why would we not encourage this win-win? 4) Websites have unfettered discretion to disregard a syntactically valid Do Not Track signal. The text does not constrain when a website can ignore a "DNT: 1" header. Would a website that disregards all signals be compliant? What about most signals? What about a random subset of signals? There is neither normative line drawing nor non-normative guidance. Consumers cannot have trust in a Do Not Track system if a website can claim compliance, but then pick and choose among headers. Objections to Option B: In its current form, I would not favor the June draft as a Do Not Track standard. Among other substantive concerns, many of which also apply to the DAA proposal: 1) Third-party websites may continue to collect a user's browsing history for enumerated "permitted uses." Instead of specially exempting particular present business models, we should delineate information practices by their privacy properties. See: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Audience_Measurement#A udience_Measurement_with_Deidentified_and_Protocol_Data http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Unique_Identifiers#Pro posal:_Limits_on_unique_identifiers_in_permitted_uses http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Security#Separate_Frau d_and_Security_Permitted_Uses 2) The definition of deidentified data is vague and potentially unenforceable. And yet, deidentified data is exempt from use limitations, collection minimization, retention transparency, and reasonable security. We must be much more precise given these implications of the definition. Non-normative text would be a good starting point. For example, Yahoo! has proposed a deidentification scheme-is it compliant? See: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Deidentification#De-id entification 3) Language on shifting away from unique identifiers is also ambiguous and potentially unenforceable. What does it mean for an "alternative solution" to be "reasonably available"? If privacy-preserving technologies are not presently required, how much would they have to improve to become required? Since the design space has already been well explored by computer scientists, would privacy-preserving implementations never be required? 4) The provisions on browser compliance are vague. I understand that we cannot reflect all possible future implementations in our text. But couldn't we at least be precise about present, popular implementations? For example, is Internet Explorer 10+ compliant? See: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_User_Agent_Compliance# UA_Compliance_Example 5) Service providers are under no obligation to use technical measures to silo their data, despite this being a present best practice and often having minimal impact on services. See: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Service_Provider#Propo sal:_Technical_Precautions_and_Internal_Practices 6) A website is textually unconstrained in disregarding facially valid "DNT: 1" signals. (Further discussion under Proposal A.) See: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Disregarding 7) The text provides an undefined loophole for "transient" information practices. See: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Transience_Collection# Collects.2Freceives 8) Websites are not sufficiently responsible for promptly detecting, mitigating, and reporting violations of the standard. See: http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Unknowing I also object to continuing from the June draft (Option B) on process grounds. Our choice set is artificially constrained to two non-consensus documents: the June draft (the product of behind-the-scenes negotiating, with ambiguous authorship) and the DAA proposal. What happened to the longstanding, consensus Editor's Draft? What happened to the privacy advocates' EFF/Mozilla/Stanford proposal? What happened to the browser vendors' proposal coming out of Sunnyvale? Setting aside legitimacy concerns, there are at least two substantial effects of this choice architecture. 1) We are choosing between a (purportedly) middle-of-the-road text and an advertising industry-backed text. And not just any advertising text, by the way: a text with novel "non-tracking" and "deidentified"/"unlinked" exemptions that are far beyond what we've discussed previously. Going into this decision, then, the thumb is already on the scales against browser vendors and privacy advocates. They don't even have proposals on the table. But even supposing the co-chairs select the June draft, the advertising industry still comes out ahead. Proponents of the DAA proposal will (understandably) require that the June draft be amended to incorporate at least some degree of the provisions that they drafted. How could it be a consensus document otherwise? Even if the June draft is selected as the base text, then, we'll move towards some hybrid of the June draft and the DAA proposal. This smacks of a "heads we win, tails you lose" property for the browser vendors and privacy advocates. 2) Written submissions will indicate whether participants favor the DAA proposal or the June draft. They will not, however, indicate whether either proposal can achieve consensus in the working group. Put differently, the group is expressing which of the two texts is *more* acceptable. But the group is not determining whether that text *is* acceptable, or even *close* to acceptable. Given the two dozen open amendment topics on the June draft, for example, that document plainly does not reflect a working group consensus. Proceeding with the June draft may be less effective than other options for working towards agreement, such as resuming the consensus-based Editor's Draft.
Received on Friday, 12 July 2013 21:10:55 UTC