[meetings] Agenda Request - Private Conversion Optimisation (#117)

benjaminsavage has just created a new issue for https://github.com/patcg/meetings:

== Agenda Request -  Private Conversion Optimisation ==
## Agenda+: Private Conversion Optimisation

One important advertising use-case we have tangentially discussed a number of times is "conversion optimisation". The problem statement is simple:

- A site has an opportunity to show an advertisement.
- There are many, many ads to choose from.
- Strategy: for each eligible ad, estimate the likelihood that showing it will lead to a "conversion event" (some valuable business outcome, e.g. a purchase)
- Select the ad expected to generate the maximum business value (e.g. the likelihood of generating a "conversion event", multiplied by the value of that event to the advertiser)

The step of "estimate the likelihood that showing [the ad] will lead to a conversion event" is the tricky part. How does that work? Let's walk through an example:

- Assume the site selecting the ad is a news website
- For each ad, they have some metadata (e.g. the topic, the dimensions, the format, etc.)
- They have some contextual information (e.g. the topic of the article on the page)
- They might have some information about the person to whom the ad would be shown (e.g. What other articles they've recently read, what other advertisements they have clicked on in the past, the region in the world where this reader is located as indicated by their IP address, maybe this person has registered an account and chosen to provide additional information to the news website, such as the type of ad-topics they are interested in)

In this case, the task is to find a function `F(x)`, which estimates the likelihood an ad impression will lead to a conversion, for some set of parameters `x` that include the things listed above. If the site has this function `F(x)`, it can just run through all the available ads, compute the parameters for that ad in this opportunity, and invoke the function N times. 

But how does the site find this function `F(x)` that does a not-terrible job of predicting the likelihood an ad will lead to a conversion? This is generally done by looking at historical data that comes out of a conversion measurement system, and finding the function `F(x)` that does the best job predicting the historical results that were observed over the past few weeks.

So this use-case is closely related to our discussion of "private measurement". The connection is that one approach to "Private Conversion Optimisation" is to attempt to train an ML model _directly_ on the outputs of a "Private Measurement" system.

The Google Chrome team has explored an approach that I will call "Event DP", and it has shown promise. In short, the idea is to just have the "private measurement" system emit noisy event-level reports.
- The Chrome team has proposed the "ARA - Event level API" with this goal in mind. [link](https://developer.chrome.com/en/docs/privacy-sandbox/attribution-reporting/#event-level-reports). 
- @csharrison also recently filed an issue to discuss this type of approach _in general_ [link](https://github.com/patcg/meetings/issues/112)
- The Criteo team recently published an article, documenting their efforts to use the "ARA - Event level API" for exactly this purpose [link](https://medium.com/criteo-engineering/criteos-first-look-at-the-privacy-sandbox-attribution-reporting-api-event-level-f96f42537b9c), and concluded that "Event-level reports provide data that can be used for Machine Learning and campaign optimization".

But I would like to discuss a more general question: 
**As we endeavour to standardise a "Private Measurement" API, do we want to explicitly try to solve for the "Private Conversion Optimization" use-case?**

If we *do* want to support this use-case, there are a variety of approaches we can consider. Returning event-level reports is *not* the only approach to this problem.

### First Goal for this discussion:
Introduce the group to a few high-level approaches to the "Private Conversion Optimization" problem that have been explored:

1. Event DP
2. Label DP
3. Private Logistic Regression
4. [DP SGD](https://arxiv.org/abs/1607.00133)

While there are more, I think that should provide a "taste" of the various types of approaches, and I believe I can cover these in a reasonable amount of time.

### Second Goal for this discussion:
Having laid out a bit of a landscape of potential approaches we could take to this problem, see if the PAT-CG members are interested in taking up an explicit goal to try to support this use-case with whatever "private measurement" system we eventually standardise.

### Third Goal for this discussion:
Assuming we *do* want to take up the objective of ensuring our "private measurement" system can support this use-case, get a quick temperature check from the group about how they feel about the 4 approaches outlined above. Are there any approaches the group does *NOT* want to explore? Are there any they feel particularly optimistic about?

## Time

I think I will need 20 minutes to cover the 4 approaches listed above, and I think we need at least 25 minutes for discussion, so a total of 45 minutes.

### Links

I've put all the content and links inline here in this issue.

Please view or discuss this issue at https://github.com/patcg/meetings/issues/117 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 21 April 2023 04:52:43 UTC