Re: Frequency Capping from Chris Mejia on 2012-07-12 (public-tracking@w3.org from July 2012)

From: Chris Mejia <chris.mejia@iab.net>
Date: Thu, 12 Jul 2012 16:12:36 +0000
To: Tamir Israel <tisrael@cippic.ca>
CC: Peter Eckersley <peter.eckersley@gmail.com>, Jonathan Mayer <jmayer@stanford.edu>, "Grimmelmann, James" <James.Grimmelmann@nyls.edu>, W3C DNT Working Group Mailing List <public-tracking@w3.org>, Mike Zaneis <mike@iab.net>, Brendan Riordan-Butterworth <Brendan@iab.net>
Message-ID: <CC24688C.1FE71%chris.mejia@iab.net>
Tamir, my reply can be found below inline with your comments.


Chris Mejia | Digital Supply Chain Solutions | Ad Technology Group | Interactive Advertising Bureau - IAB

From: Tamir Israel <tisrael@cippic.ca<mailto:tisrael@cippic.ca>>
Date: Thu, 12 Jul 2012 11:28:07 -0400
To: Chris Mejia - IAB <chris.mejia@iab.net<mailto:chris.mejia@iab.net>>
Cc: Peter Eckersley <peter.eckersley@gmail.com<mailto:peter.eckersley@gmail.com>>, Jonathan Mayer <jmayer@stanford.edu<mailto:jmayer@stanford.edu>>, "Grimmelmann, James" <James.Grimmelmann@nyls.edu<mailto:James.Grimmelmann@nyls.edu>>, W3C DNT Working Group Mailing List <public-tracking@w3.org<mailto:public-tracking@w3.org>>, Mike Zaneis - IAB <mike@iab.net<mailto:mike@iab.net>>, Brendan Riordan-Butterworth - IAB <brendan@iab.net<mailto:brendan@iab.net>>
Subject: Re: Frequency Capping

Hi Chris --

On 7/12/2012 10:49 AM, Chris Mejia wrote:
Peter,

First, I'd appreciate it if you didn't take my comments out of the full context in which they were written— in other words, when quoting me in this forum, please leave the rest of my comment and the thread in tact so others can read my comments under the full context in which they were written.  I have afforded the same courtesy to you.

Industry has been at this table, at tremendous cost, negotiating in good faith.  The mere fact that we are having this very debate proves my/our commitment to this serious matter.  That we don't agree on certain points should not be conflated with a lack of willingness to agree to compromise on other points, should such compromise be warranted and in the best interest in Internet users.

Which advertising industry engineers do you know who agree with your assertions about f-capping?  I'd be interested in speaking with them directly about the pros and cons of your argument.  That may be a useful debate.
I think what Peter is referring to is that some users might view the very fact that they are being tracked in order to facilitate the advertising activities of many random third parties they have never interacted with to be a 'privacy harm'. As I noted previously, we can start debating the relative costs/benefits of an F-cap approach that is more privacy protective, if only someone from industry were willing to provide a sense of how privacy-friendly F-capping can be done.

CM:  Please see Brian O'Kelley's description of f-capping pasted below.  In my experience, this description closely describes the most common practice for f-capping.

So far, I have not seen this, nor have I seen any direct substantive responses to why the alternative F-capping proposals suggested by some are not workable. A good faith attempt to resolve a problem would entail these very engineers that you are referring to  engaging, in good faith, in attempts to solve what is, at first instance, a technical problem.

CM:  In fact this thread started as a result of Prof. Ed Felton's FTC blog post (http://techatftc.wordpress.com/2012/07/03/privacy-by-design-frequency-capping/).  David Wainberg called our attention to Brian O'Kelley's comments posted to Prof. Felton's blog.  Brian O'Kelley is the founder and CEO of AppNexus (he was also a founder at RightMedia) and is one of the foremost advertising technology engineers in the industry.  Brian's comments directly refuted the methods outlined by Prof. Felton, based in large part on severe performance issues (unacceptable ad serving performance that would negatively increase page load times) and scale issues. Jonathan Meyers then challenged Brian's critique of Prof Felton's Blog, but here on the W3C forum (2nd post in this thread I believe).  Since I realize that Brian's comments were never brought directly into this forum, I'm repasting them here now:

Brian O'Kelley<http://www.appnexus.com/>
July 6, 2012 at 2:56 pm<http://techatftc.wordpress.com/2012/07/03/privacy-by-design-frequency-capping/#comment-149>

Prof. Felten,

Very glad you’re covering this topic – it’s critical for the advertising industry to be able to frequency cap in a privacy-friendly way. Let me start by making sure we’re on the same page about the frequency capping functionality that advertisers actually need and use:

1. Serve a campaign no more than once per day per user (this is the simplest case)
2. Serve a campaign no more than X times per day per user (still relatively simple, but means user might see the ad X times in a row in the course of a few pages then never again that day)
3. Serve a campaign no more than once per X hours (you could specify this with the previous variant, like, no more than 5 times a day with at least an hour between them)
4. Serve a campaign no more than once per session (without 20 minutes of inactivity)
5. Serve a campaign no more than X times ever (for the lifetime of the cookie)
- Serve a creative (a particular ad) no more than [1-5 above]
- Serve an ad for the entire advertiser (ie Coke) no more than [1-5 above] regardless of the campaign or creative

At AppNexus, we use the second method you mention in your article: we store counts. Here’s a real-life example from my AppNexus profile:

“frequency”:[["a",1291,0,0,1,1341210991],["c",12441,0,0,3,1341350729],["a",33475,0,0,9,1341220399]]

The first array element says I saw an ad from advertiser 1291 0 times this session, 0 times today, and 1 time ever. The last time I saw an ad from advertiser 1291 was at timestamp 1341210991. Note that we don’t store any information about what site I saw the ad on.

The information we do store lets us perform all five frequency capping functions. The first field lets us specify whether the frequency is for the advertiser, campaign, or creative. The second field tells us the id of this object. The session frequency count lets us determine #4, in combination with the timestamp so we know when we should reset the counter. The daily and lifetime frequency count give us #1, #2, and #5. The timestamp itself gives us #3 since we can check how long it’s been since we last served this creative, campaign, or advertiser. Note that this means we have to update three records every time we serve an ad – the creative, campaign, and the advertiser.

That explains the data we store. Now, how do we use it? Say we have 600,000 creatives live in our system and 100,000 campaigns (those are probably quite low, but you get the idea). When we go to choose an ad to serve, we have to check some basic rules for each campaign (does the advertiser want to serve on this site? does the advertiser want to serve to this geography?) and then see if the campaign’s frequency capping rules allow it to be served.

To check the frequency cap, we look up the campaign ID in our frequency data array and evaluate rules 1-5 above. Then we do the same for the advertiser and the creative. This process could happen tens of thousands of times per ad served (depending on how many campaigns pass targeting for a particular ad call). With the full array in memory, we can do this quite fast (binary search, say, on an array averaging 200 elements).

Now, let’s consider the campaign-user hash solution you suggest in your article. A simple approach would be to have a three-key index (user id, object type, object id) into a structure like {session frequency, daily frequency, lifetime frequency, timestamp}. On the face of it, this supports all of the use-cases above.

However, there’s a hitch. User data is too big (~10TB) to be resident in memory on the targeting servers so we need to store it on a separate server cluster. In the array model, we make a single request to the user data cluster for a user ID and get back the full frequency array. The RTT is around 10 milliseconds. In the hash model, I don’t have locality for a particular user, so I have to query for each campaign. Now that 10 milliseconds becomes 100 seconds or more! You could reduce this by batching requests and parallelizing the clusters, but you would need a 1000x performance improvement to make this feasible, and even so you’d scale with the number of campaigns that pass targeting instead of with the number of frequency records for the user like the array model. I don’t think the campaign-user hash is practical in a scale production environment.

On the extra credit, I think it works well for use case #1. You bloom filter on existence of the user-object key. It can’t give you a false negative, which is critical, and a false positive is a small likelihood as you mention. I don’t think it works for the more complex cases.

Question for you: Given that AppNexus is following your recommendations on privacy-sensitive storage of frequency data, what can we do to help the FTC make sure that frequency caps are not thrown out with the privacy bathwater?

Brian O’Kelley

Finally, please pardon my ignorance (as I don't know you); what organization and constituency do you represent?  You haven't provided a signature line indicating your affiliation and you are writing to this forum from a gmail address, so I was not able to ascertain your affiliation, if any, from this information.  In the interest of full disclosure, I represent the membership of the Interactive Advertising Association (IAB – www.iab.net<http://www.iab.net>) where I work in the Advertising Technology Group with industry engineers and operations professionals on technical specifications, technical protocols and technical guidance.
With respect, Chris, I don't think this is productive. If it really is helpful to start throwing around credentials, I will say that CIPPIC (the public interest NGO I represent) is supportive on this point of the Standford (Jonathan)/EFF (Peter)/  Mozilla (Tom) compromise proposal which was presented to the group here a few weeks back and which did not include any explicit exception for tracking users for the purpose of F-caps.

CM:  Tamir, when making my request to understand Peter's affiliation, I did ask that he "please pardon my ignorance"; this was sincere.  I don't know Peter and I honestly did not understand his affiliation— as you might appreciate, operating in this political circle is not my usual job (I'm a technologist, not a politician, so again, please pardon my ignorance with respect to your world).  I was asking as a point of clarification, so I could further appreciate his POV.  Understanding where someone comes from allows me to better understand the context of their position.  I also take some offense with the notion that I was "throwing around credentials"; I was simply stating my own affiliation, out of respect, as I had requested the same of Peter.  Please don't turn this into a political you guys vs. us guys thing— I don't find that productive at all.  I'd rather focus our debate on the merits of all particular arguments being presented. (BTW- your affiliation was clear from your email address)

Best,
Tamir



Chris Mejia | Digital Supply Chain Solutions | Ad Technology Group | Interactive Advertising Bureau - IAB

From: Peter Eckersley <peter.eckersley@gmail.com<mailto:peter.eckersley@gmail.com>>
Date: Wed, 11 Jul 2012 19:57:18 -0700
To: Chris Mejia - IAB <chris.mejia@iab.net<mailto:chris.mejia@iab.net>>
Cc: Jonathan Mayer <jmayer@stanford.edu<mailto:jmayer@stanford.edu>>, Tamir Israel <tisrael@cippic.ca<mailto:tisrael@cippic.ca>>, "Grimmelmann, James" <James.Grimmelmann@nyls.edu<mailto:James.Grimmelmann@nyls.edu>>, W3C DNT Working Group Mailing List <public-tracking@w3.org<mailto:public-tracking@w3.org>>, Mike Zaneis - IAB <mike@iab.net<mailto:mike@iab.net>>, Brendan Riordan-Butterworth - IAB <brendan@iab.net<mailto:brendan@iab.net>>
Subject: Re: Frequency Capping


On 11 July 2012 18:42, Chris Mejia <chris.mejia@iab.net<mailto:chris.mejia@iab.net>> wrote:

I work with and represent those key advertising industry engineers you're calling out— the overwhelming response to the proposed "work arounds" to date has been "no, we can't do that" (sorry). Again, performance and scale issues abound, not to mention needless conflicts with the fundamentals of the advertising business.  Respectfully, the job of any working group that wants to regulate another group is to provide real justification for its need to regulate, THEN solve for real-world problems. Bring real problems; justify them; cite examples; show research/data; support your case.  THEN we can start talking and brainstorming solutions.

What you are claiming here, I believe, is that (some? all?) advertising industry engineers do not wish to compromise with privacy and consumer groups on a meaningful Do Not Track standard.

We have gone to tremendous lengths, and in good faith, to compromise on our end.  It's disappointing not to see that from the other side.

--
Peter
Received on Thursday, 12 July 2012 16:13:26 UTC