- From: Kathy Joe <K.Joe@esomar.org>
- Date: Mon, 13 Feb 2012 10:51:28 +0100
- To: "'Shane Wiley'" <wileys@yahoo-inc.com>, Justin Brookman <justin@cdt.org>, "public-tracking@w3.org" <public-tracking@w3.org>
- CC: "'adam.phillips@realresearch.co.uk'" <adam.phillips@realresearch.co.uk>, "Elise.Berkower@nielsen.com" <Elise.Berkower@nielsen.com>, "'Deliyannis, Alexandros'" <Alexandros.Deliyannis@nielsen.com>
- Message-ID: <BD7B8E6C4C649C488DE6920588B2E30CAA7BA986B2@ESSV011.esomar.local>
Dear All, As stated in our text for market research (Issue 34: see below), market research identifiable data will be held as long as the campaign runs to provide consistent data after which all identifiers will be removed after a reasonable period. The period needs to be flexible as it depends on how long the campaign runs and necessary quality controls to check the integrity of the data as agreed with the client. Note, under no circumstances is identifiable data provided to the client or used for profiling purposes. Issue 34: Exemption for aggregated data - Aggregated data is permissible for purposes such as research, industry trends, and analytics. Parties wishing to use aggregated data must take reasonable steps to ensure that data does not reveal information about individual users, user agents, or devices and it must not be possible to identify an individual with aggregated cross site data. Description: The research client wants statistical measurements of how many users have been exposed to their campaigns in broad categories across different sites. The client will for instance place the research company's tags on their ads on one or more sites that count viewers based on cookies. Any identifiers are removed as soon as the data has been sorted into broad categories eg country. Suggestion/Example: ExampleResearch collects data for ExampleProducts Inc. which is running an ad campaign online on various sites. It gathers cross-site data on how often a user views a relevant ad but none of their other web behaviour. The purpose is to fulfil a request by a first party (the advertiser), and the results are shared only with the first party. The output is restricted to aggregated and unidentifiable data, will not impact a user's experience, use is only for the statistical research purpose and cannot be linked to a specific user, computer or device and cannot be used for profiling. Identifiable data will be held as long as the campaign runs to provide consistent data and then all identifiers will be removed after a reasonable period. Issue 74: Are surveys out of scope - close, covered under Issues 25 and 34. Kathy Joe Professional Standards & Public Affairs Director [cid:image001.jpg@01CCEA3D.6F0CB470] Eurocenter 2, 11th floor Barbara Strozzilaan 384 1083 HN Amsterdam The Netherlands Tel: +31 20 664 2141 Fax: +31 20 664 2922 www.esomar.org<http://www.esomar.org/> ________________________________ From: Shane Wiley [mailto:wileys@yahoo-inc.com] Sent: 13 February 2012 00:12 To: Justin Brookman; public-tracking@w3.org Cc: public-tracking@w3.org Subject: RE: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49) Justin, I believe both "market research" and "product improvement" can be managed via the aggregate and anonymous requirements so perhaps this isn't a significant issue ("debugging" will require more detailed data but should hopefully only need to live for a short period of time). I'll continue to resist arbitrary retention timeframes as they're short-sighted, not based on business specifics, and are incredibly expensive to implement. An approach of holding good actors to their stated retention timeframes is the more appropriate outcome as its future proof, based on the specifics of each company's needs, and, while still expensive to implement, should already closely align with data retention timeframes where companies have implemented minimization standards. All of these together providing for a faster pathway to broad industry-wide, global implementation of DNT. - Shane From: Justin Brookman [mailto:justin@cdt.org] Sent: Friday, February 10, 2012 3:47 PM To: public-tracking@w3.org Cc: public-tracking@w3.org Subject: Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49) I understand the argument against prescriptive retention limitations. However, if you're saying that we have to rely on a vague standard of "data minimization" as interpreted by each individual company, that is a strong argument against very broad and not strictly necessary exceptions for collection (retention, whatever) in the first place, especially for categories for where there is no logical deletion point. If the standard is just "reasonable data minimization," I would argue that "market research" and "product improvement" should not be recognized as exceptions, though I could see an argument for a narrower "debugging" exception. However, "market research" and "product improvement" would still be allowed if they met the anonymized data exception. Shane, apologies if this is rehashing arguments you've already had; this list is challenging to keep up with (even for editors). Roy, deleting cookies is insufficient as it does not address non-cookie-based tracking technologies. Justin Brookman Director, Consumer Privacy Center for Democracy & Technology 1634 I Street NW, Suite 1100 Washington, DC 20006 tel 202.407.8812 fax 202.637.0969 justin@cdt.org<mailto:justin@cdt.org> http://www.cdt.org @CenDemTech @JustinBrookman On 2/10/2012 5:04 PM, Marc Groman wrote: Justin, I understand the argument you are presenting and the concerns around "broad buckets" but I simply don't understand how a Do Not Track standard can possibly attempt to globally set specific data minimization, data retention, and data deletion standards for all of the players in this ecosystem. I'm open to sitting down with you and discussing this further. Marc --- Marc M. Groman Network Advertising Initiative | Executive Director and General Counsel 1001 Connecticut Ave., Suite 705, Washington, DC 20036 P: 202-835-9810 | mgroman@networkadvertising.org<mailto:mgroman@networkadvertising.org> [cid:image002.gif@01CCEA3D.6F0CB470] On Feb 10, 2012, at 4:59 PM, Shane Wiley wrote: Justin, This comes to the topic of "retention" (not collection) and I've already stated on several email chains that we're more than willing to (and already do) comply with a data minimization standards (versus arbitrary pan-global, pan industry, pan business model time frames). - Shane From: Justin Brookman [mailto:justin@cdt.org] Sent: Friday, February 10, 2012 2:51 PM To: public-tracking@w3.org<mailto:public-tracking@w3.org> Subject: Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49) Shane, The current industry standard allows third-party ad networks to collect and retain information for a few rather broad buckets of use, including "product development" and "market research," with no data deletion or minimization requirement once data qualifies for those buckets. Are you saying that you're not willing to countenance a W3C standard that would narrow the permissible purposes for which data may be collected, or that would impose some data minimization/deletion requirement once the data had been collected for a permissible purpose (whether a hard limit or a more vague "reasonable data minimization" standard)? Justin Brookman Director, Consumer Privacy Center for Democracy & Technology 1634 I Street NW, Suite 1100 Washington, DC 20006 tel 202.407.8812 fax 202.637.0969 justin@cdt.org<mailto:justin@cdt.org> http://www.cdt.org @CenDemTech @JustinBrookman On 2/10/2012 2:26 PM, Shane Wiley wrote: Jonathan, I'm open to compromise but need to ensure the outcomes don't levy significant cost and loss of revenue to the online advertising industry in the process (sincerely looking for the appropriate balance). I offered that we start at "use-based limitations" for the MUST (yes, this means we need to trust good actors) and set new technology approaches as SHOULD. I believe this is a reasonable compromise. Yahoo! (and other industry participants) will immediately engage with you and others to begin the design process for privacy enhancing technologies to help bring these solutions to market in a measured and thoughtful manner - and in a way that all participants can easily upgrade their current efforts to embrace. Big picture: large companies and academia work together to develop the baseline tech and then provide this as open-source (for example, Apache) to mid-size and small companies. Our companies are taking on all the cost and disruption the above entails - in light of consumer privacy risks that have never been proven to be real (yet -> understanding the "technically possible" angle) and are immediately addressing all of the issues surrounding cross-site profiling data collection and use. How you do not see this as compromise is difficult for me to understand. - Shane From: Jonathan Mayer [mailto:jmayer@stanford.edu] Sent: Friday, February 10, 2012 12:01 PM To: Shane Wiley Cc: Justin Brookman; public-tracking@w3.org<mailto:public-tracking@w3.org> Subject: Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49) Shane, Your objections in response to this proposal (and earlier discussions of privacy-preserving technology) suggest that you will not accept *any* deviation from current data collection practices. That's not compromise. Jonathan On Feb 10, 2012, at 10:56 AM, Shane Wiley wrote: Jonathan, I appreciate and respect the desire to find a technical solution to online identifiers and identification at a rapid clip. These concepts and their related implementations require much deeper thought, discussion, design, and ultimately consensus. Your current proposal (on its surface) would be impossible to achieve at our scale in just 6 months - and would completely halt/disrupt the established product roadmap for our ad products (which are working hard to be competitive and keep our systems evolving with the marketplace). It would literally take a year or two to go in this direction if we even agreed this was an appropriate outcome - which I believe it is not at this time but am more than willing to keep the conversation going in a different forum. It's my opinion that the DNT WG is NOT the appropriate forum to determine what is appropriate for online identifiers and identification (much more involved effort than this isolated conversation). - Shane From: Jonathan Robert Mayer [mailto:jmayer@stanford.edu] Sent: Friday, February 10, 2012 11:48 AM To: Shane Wiley Cc: Justin Brookman; public-tracking@w3.org<mailto:public-tracking@w3.org> Subject: Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49) Whatever the difficulty of implementation, I understand it won't happen overnight. How about if we provide a short-term grandfathering-in period? For example, six months where frequency capping etc. can still be accomplished with an ID cookie? Jonathan On Feb 10, 2012, at 10:29 AM, Shane Wiley <wileys@yahoo-inc.com<mailto:wileys@yahoo-inc.com>> wrote: Jonathan, Moving an entire architecture that is cookie based to one that is IP + User Agent based is not trivial and would require changes at all tiers (hosting servers, operational servers, data warehousing systems, reporting, security, all scripts and coding logic for system interoperability, etc.). When I quoted the timelines I was being serious. It's a significant and fundamental change across the board. And while some ad networks may use protocol information for "operational uses" they probably also use cookies. So removing cookies from the equation would have significant issues for them as well - again, across the board. I don't believe I'm "over estimating" the effort for effect. Side Note 1: I believe there is another Working Group focused on Online Identity (perhaps not W3C though - I'll try to track this down). I mention this as it goes back to my earlier comments on not attempting to solve all online privacy issues in a single working group. It's unfortunate the charter of this working group has been so broadly interpreted by some as that appears to be where much of the churn is in our efforts. If our focused was constrained to "profiling" and uses of "profiling", I believe we'd be MUCH further along. Side Note 2: I believe the truth of our current situation is somewhere between Mike's email and that our disagreements are localized to just a few issues (as you've stated). The operational purpose exceptions and implementation cost are so core to the discussion (and the on-going ability for many web based companies to monetize their efforts) AND appear to be incredibly divisive as to render our progress halted at this time (akin to "going in circles" versus making incremental steps forward). Purely my opinion... - Shane From: Jonathan Mayer [mailto:jmayer@stanford.edu] Sent: Friday, February 10, 2012 10:46 AM To: Shane Wiley Cc: Justin Brookman; public-tracking@w3.org<mailto:public-tracking@w3.org> Subject: Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49) Shane, Could you give a bit more explanation of how this would "require massive re-architecture of most internal systems"? As I understand it, some advertising networks already use protocol information for "operational uses." For those companies that don't, a quick implementation would be to just hash IP address + User-Agent string and treat that as an identifier. I don't mean to excessively trivialize the implementation burden, but it seems to me much lesser than other alternatives on the table (save, of course, business as usual). As for objections to fingerprinting, I want to be clear that the idea I'm floating is passive fingerprinting, not active fingerprinting. Passive fingerprinting leverages information that we would already allow companies to collect-no more. Jonathan On Feb 10, 2012, at 9:34 AM, Shane Wiley wrote: Jonathan, I believe this could be a "SHOULD" goal because of two core factors: 1. This approach will require massive re-architecture of most internal systems (several year effort for a large company - months to years for mid-size companies - may be too complex for small companies until native platforms come built with this and they can upgrade), and 2. There are perhaps larger privacy issues here with the use of Digital Fingerprints. Some advocates (you don't appear to be with them) believe that a cookie is a better tool than a Digital Fingerprint as consumers have control of cookies - whereas with a Digital Fingerprint they do not (at least not in a simple, native tool perspective). I'm personally on the side of Cookies as I believe the control factor and the wealth of automated tools for blocking and purging them is a better outcome for consumers than are Digital Fingerprints. Side Note: Digital Fingerprints are argued by some vendors to be far more effective for tracking due to the lack of consumer control and the realities of cookie churn. - Shane From: Jonathan Mayer [mailto:jmayer@stanford.edu] Sent: Friday, February 10, 2012 10:16 AM To: Justin Brookman Cc: public-tracking@w3.org<mailto:public-tracking@w3.org> Subject: Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49) Thinking more about tracking through IP address + User-Agent string, it occurs to me that the greatest challenges are stability over time and across locations. For some of the "operational uses" we have discussed, time- and geography- limited tracking may be adequate. Scoping the "operational use" exceptions to protocol data would somewhat accommodate those uses without allowing for new data collection, and it would be easier to implement than a client-side privacy-preserving technology. Thoughts on whether this is a possible new direction for compromise? Jonathan On Feb 10, 2012, at 8:30 AM, Jonathan Mayer wrote: Justin, I think you may be misreading the state of research on tracking through IP address + User-Agent string. There is substantial evidence that some browsers can be tracked in that way some of the time. I am not aware of any study that compares the global effectiveness of tracking through IP address + User-Agent string vs. an ID cookie; intuitively, the ID cookie should be far more effective. The news story you cite glosses over important caveats in that paper's methodology; it is certainly not the case that "62% of the time, HTTP user-agent information alone can accurately tag a host." Jonathan On Feb 9, 2012, at 6:48 PM, Justin Brookman wrote: Sure. As the spec current reads, third-party ad networks are allowed to serve contextual ads on sites even when DNT:1 is on, yes? In order to do this, they're going to get log data, user agent string, device info, IP address, referrer url, etc. There is growing recognition that that information in and of itself can be used to uniquely identify devices over time (http://www.networkworld.com/news/2012/020212-microsoft-anonymous-255667.html) for profiling purposes. It was my understanding that one of the primary arguments against allowing third parties to place unique identifiers on the client was because of the concern that they were going to be secretly tracking and building profiles using those cookies. My point is that they will be able to do that regardless, with little external ability to audit. This system is going to rely to some extent on trust unless we are proposing to fundamentally rearchitecture the web. The other argument that I've heard against using unique cookies for this purpose is valid, though to me less compelling: that even if just used for frequency capping, third parties are going to be able to amass data about the types of ads a device sees, from which you could surmise general information about the sites visited on that device (e.g., you are frequency capping a bunch of sports ads --> ergo, the operator of that device probably visiting sports pages). Everyone seems to agree that it would be improper for a company to use this information to profile (meta-profile?), but there are still concerns about data breach, illegitimate access, and government access of this potentially revealing information. This concerns me too, but the shadow of my .url stream is to me considerably less privacy sensitive than my actual .url stream. I could be willing to compromise on a solution that allowed for using cookies for frequency capping, if there was agreement on limiting to reasonable campaign length, rules against repurposing, and a requirement to make an accountable statement of adherence to the standard. I would be interested to hear if it would be feasible to not register frequency caps for ads for sensitive categories of information (or if at all, cap client-side), though again, it's important to keep in mind that that data may well be collected and retained for other excepted purposes under the standard (e.g., fraud prevention) --- cookie or not. ________________________________ From: Jonathan Mayer [mailto:jmayer@stanford.edu] To: Justin Brookman [mailto:justin@cdt.org] Cc: public-tracking@w3.org<mailto:public-tracking@w3.org> Sent: Thu, 09 Feb 2012 18:32:19 -0500 Subject: Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49) Justin, could you explain what you mean here? Thanks, Jonathan On Feb 9, 2012, at 3:17 PM, Justin Brookman wrote: > the standard currently recognizes that third parties are frequently going to be allowed to obtain uniquely-identifying user agent strings despite the presence of a DNT:1 header
Attachments
- image/jpeg attachment: image001.jpg
- image/gif attachment: image002.gif
Received on Monday, 13 February 2012 09:53:34 UTC