Re: Frequency Capping from Chris Mejia on 2012-07-12 (public-tracking@w3.org from July 2012)

From: Chris Mejia <chris.mejia@iab.net>
Date: Thu, 12 Jul 2012 18:35:58 +0000
To: Tamir Israel <tisrael@cippic.ca>
CC: Peter Eckersley <peter.eckersley@gmail.com>, Jonathan Mayer <jmayer@stanford.edu>, "Grimmelmann, James" <James.Grimmelmann@nyls.edu>, W3C DNT Working Group Mailing List <public-tracking@w3.org>, Mike Zaneis <mike@iab.net>, Brendan Riordan-Butterworth <Brendan@iab.net>
Message-ID: <CC24842F.1FECA%chris.mejia@iab.net>
CM:  Adding additional useful context regarding why the alternative f-capping methods Prof. Felton proposed in his FTC blog are not technically feasible, my colleague in the IAB's Advertising Technology group has also weighed in (http://techatftc.wordpress.com/2012/07/03/privacy-by-design-frequency-capping/) and I am sharing his technical comments below.

Chris Mejia | Digital Supply Chain Solutions | Ad Technology Group | Interactive Advertising Bureau - IAB

Brendan Riordan-Butterworth<http://www.iab.net/>
July 11, 2012 at 4:26 pm<http://techatftc.wordpress.com/2012/07/03/privacy-by-design-frequency-capping/#comment-160>

Last week I shared my thoughts about your post with my peers here at the IAB, and was today urged to share with this larger audience.

Your initial thoughts on moving information storage to the client cookie jar are incorrect. Client-side frequency capping doesn’t fail “because ad placement decisions are normally made on the ad network’s servers but the frequency information will now be stored elsewhere,” it fails because client-side storage of the frequency state of multiple campaigns has the potential of overloading the cookie header, and because updating client state can be blocked in 3rd party scenarios. Having the frequency state on the client (and therefore on the inbound HTTP request) can actually make frequency capping easier on the server side, because you don’t have to propagate this state across all the physical ad servers.

With regards to the “second way”, tech may have changed, but in my experience profitable ad servers record the minimum required information in order to bill – recording something like the HTTP REFERER (IE, what page the ad was delivered onto) can increase log record size several times over, significantly increasing hardware costs and data processing latency. That said, recording the publisher ID or campaign ID (IE, what site/network is supposed to be delivering the ad) is standard practice, since you need to know who to pay. As Brian O’Kelley has indicated, minimizing the set of data stored is still sensible design for performance.

Implied in Brian’s comment is that there’s a “frequency” record for every userID – the data structures for targeting in current systems use user pseudonyms as primary keys for targeting data, including frequency capping. Moving to a hash of userID/CampaignID for storing frequency capping multiplies the number of records in the data structure storing information by the number of campaigns (and possibly the number of advertisers), thereby incurring additional cost in storage and look up time, in addition to the (minor) hashing cost of inbound userID/CampaignID.

I think the suggestion of using bloom filters for storing frequency capping assumes that there is an absolute maximum number of times a specific userID can be shown an ad – a strictly additive situation. However, as Brian pointed out, the capping happens at intervals less than the campaign duration – and the intervals are user specific. You could implement by recording whether UserID interacted with CampaignID during arbitrary intervals, but doing so would generate either a significant increase of items to store, or the additional complexity of maintaining time sequenced Bloom filters, and either implementation loses out on some granularity of timing.

Tamir Israel wrote:

"Now -- if you're saying there is a good reason to collect here because the costs of doing otherwise are exponential and the benefits minimal, that is a discussion we can engage in meaningfully. But we seem to be unable to get to that step."


CM:  We have always maintained there is a good reason to do frequency capping.  There are critical technical issues with the alternatives proposed—I believe we have addressed those now in this forum/thread.

"This is not an unusual definition of 'privacy harm'. In fact, the basis of most privacy protective regimes since the OECD guidelines and CoE Convention 108 has been to minimize collection to what is necessary."

CM:  I take some exception to your rather loose definition (through inference) of the word "harm".  When I look up the word harm in the Merriam-Webster dictionary (provided free of charge online now, and advertising supported: http://www.merriam-webster.com/dictionary/harm), I found the following definition consistent with a common understanding of the term:

Definition of HARM
1
: physical or mental damage : injury<http://www.merriam-webster.com/dictionary/injury>
2
: mischief<http://www.merriam-webster.com/dictionary/mischief>, hurt<http://www.merriam-webster.com/dictionary/hurt>

I fail to see where the collection of information for the purpose of frequency capping the delivery of a single ad creative to a user causes users "physical or mental damage" or "injury" (context  for examination should be the reasonable interpretation of the term based on its definition).  I also fail to see where the companies that engage in this user-friendly practice are engaging in "mischief" or causing "hurt".  So where is the actual harm?  I don't believe there is any harm being done to a user or their privacy through this practice.  On the contrary, it doesn't seem unreasonable to claim that some harm can be done when users are delivered the same ads over and over again, indiscriminately.  And when users lose access to content all together, or have to pay to access content that was previously available free of charge based on an poorly implemented DNT mechanism, because publishers can no longer afford to provide advertising-supported content or must charge for content (i.e. pay walls), clearly harm will have been done— but this harm will have been caused by an irresponsible DNT mechanism.

Regarding your comment about data minimization, with respect to f-capping, I believe the practice of data minimization for f-capping is already the industry practice (and I'd like to see any actual data to the contrary), based simply on costs (latency/storage/monetary) associated with the practice (as outlined above by Brendan).

From: Tamir Israel <tisrael@cippic.ca<mailto:tisrael@cippic.ca>>
Date: Thu, 12 Jul 2012 12:36:38 -0400
To: Chris Mejia - IAB <chris.mejia@iab.net<mailto:chris.mejia@iab.net>>
Cc: Peter Eckersley <peter.eckersley@gmail.com<mailto:peter.eckersley@gmail.com>>, Jonathan Mayer <jmayer@stanford.edu<mailto:jmayer@stanford.edu>>, "Grimmelmann, James" <James.Grimmelmann@nyls.edu<mailto:James.Grimmelmann@nyls.edu>>, W3C DNT Working Group Mailing List <public-tracking@w3.org<mailto:public-tracking@w3.org>>, Mike Zaneis - IAB <mike@iab.net<mailto:mike@iab.net>>, Brendan Riordan-Butterworth - IAB <brendan@iab.net<mailto:brendan@iab.net>>
Subject: Re: Frequency Capping

On 7/12/2012 12:12 PM, Chris Mejia wrote:

I think what Peter is referring to is that some users might view the very fact that they are being tracked in order to facilitate the advertising activities of many random third parties they have never interacted with to be a 'privacy harm'. As I noted previously, we can start debating the relative costs/benefits of an F-cap approach that is more privacy protective, if only someone from industry were willing to provide a sense of how privacy-friendly F-capping can be done.
CM:  Please see Brian O'Kelley's description of f-capping pasted below.  In my experience, this description closely describes the most common practice for f-capping.

So far, I have not seen this, nor have I seen any direct substantive responses to why the alternative F-capping proposals suggested by some are not workable. A good faith attempt to resolve a problem would entail these very engineers that you are referring to  engaging, in good faith, in attempts to solve what is, at first instance, a technical problem.

CM:  In fact this thread started as a result of Prof. Ed Felton's FTC blog post (http://techatftc.wordpress.com/2012/07/03/privacy-by-design-frequency-capping/).  David Wainberg called our attention to Brian O'Kelley's comments posted to Prof. Felton's blog.  Brian O'Kelley is the founder and CEO of AppNexus (he was also a founder at RightMedia) and is one of the foremost advertising technology engineers in the industry.  Brian's comments directly refuted the methods outlined by Prof. Felton, based in large part on severe performance issues (unacceptable ad serving performance that would negatively increase page load times) and scale issues. Jonathan Meyers then challenged Brian's critique of Prof Felton's Blog, but here on the W3C forum (2nd post in this thread I believe).  Since I realize that Brian's comments were never brought directly into this forum, I'm repasting them here now:
OK, thanks Chris, I understand better where you're coming from now. I'd say, to start, that I don't think Brian O'Kelley's method is the standard. I took it as something AppNexus does that is somewhat more unique (someone can please correct me if I am wrong). But regardless, the problem with your question (provide some evidence that servers are using the unique ID from F-capping in order to connect track user browsing) is that, of course, there is absolutely no way to do this since it happens invisibly on the server.

The majority of users might trust many online advertisers not to do this kind of thing, but there are now so many advertisers out there accessing unique IDs at each and every site a user visits, that the best way to monitor 'no tracking' is to prevent collection of unique identifiers by untrusted third parties (as opposed to trying to prevent server-side correlation once collection has occurred).

This is not an unusual definition of 'privacy harm'. In fact, the basis of most privacy protective regimes since the OECD guidelines and CoE Convention 108 has been to minimize collection to what is necessary.

Now -- if you're saying there is a good reason to collect here because the costs of doing otherwise are exponential and the benefits minimal, that is a discussion we can engage in meaningfully. But we seem to be unable to get to that step.


Finally, please pardon my ignorance (as I don't know you); what organization and constituency do you represent?  You haven't provided a signature line indicating your affiliation and you are writing to this forum from a gmail address, so I was not able to ascertain your affiliation, if any, from this information.  In the interest of full disclosure, I represent the membership of the Interactive Advertising Association (IAB – www.iab.net<http://www.iab.net>) where I work in the Advertising Technology Group with industry engineers and operations professionals on technical specifications, technical protocols and technical guidance.
With respect, Chris, I don't think this is productive. If it really is helpful to start throwing around credentials, I will say that CIPPIC (the public interest NGO I represent) is supportive on this point of the Standford (Jonathan)/EFF (Peter)/  Mozilla (Tom) compromise proposal which was presented to the group here a few weeks back and which did not include any explicit exception for tracking users for the purpose of F-caps.

CM:  Tamir, when making my request to understand Peter's affiliation, I did ask that he "please pardon my ignorance"; this was sincere.  I don't know Peter and I honestly did not understand his affiliation— as you might appreciate, operating in this political circle is not my usual job (I'm a technologist, not a politician, so again, please pardon my ignorance with respect to your world).  I was asking as a point of clarification, so I could further appreciate his POV.  Understanding where someone comes from allows me to better understand the context of their position.  I also take some offense with the notion that I was "throwing around credentials"; I was simply stating my own affiliation, out of respect, as I had requested the same of Peter.  Please don't turn this into a political you guys vs. us guys thing— I don't find that productive at all.  I'd rather focus our debate on the merits of all particular arguments being presented. (BTW- your affiliation was clear from your email address)
OK. My bad. We can all blame gmail now : )

Best,
Tamir
Received on Thursday, 12 July 2012 18:36:51 UTC