[private-measurement] Strawman: Target functionality for MVP (#16)

benjaminsavage has just created a new issue for https://github.com/patcg/private-measurement:

== Strawman: Target functionality for MVP ==
# Target ads measurement use-cases for the MVP 

## Reporting use-cases for Ad Buyers

### Counts, for all 3 types of **source events**
1. Count of "Click-through" attributed conversions
2. Count of "View-through" attributed conversions
3. "Conversion-lift", that is the count of conversions in "test" and "control" (source event = "Ad Opportunity")

### Sums, for all 3 types of **source events**
1. Sum of conversion value across "Click-through" attributed conversions
2. Sum of conversion value across "View-through" attributed conversions
3. "Conversion-lift", that is the sum of conversion value in "test" and "control" (source event = "Ad Opportunity")

### Multiple Breakdowns
Assuming **source events** are associated with multiple arbitrary **breakdown keys**
1. For a given set of source and trigger events, ability to generate **multiple different breakdowns** (e.g. campaign_id, age x gender, region)

### Cross-environment
For all of the above, support for cases where **source events** and **trigger events** originate from different environments:
| | App | In App Webview | Web |
|-|----|-----|----|
|App | Yes | Yes | Yes |
| In App Webview | Yes | Yes | Yes |
| Web | Yes | Yes | Yes |

### Cross-device
For all of the above, support for cases where **source events** and **trigger events** originate from different devices. For example:
- App => Laptop
- CTV => Laptop
- CTV => App

### Cross-publisher
For all of the above, support for cases where the **source events** originate from multiple different apps / websites operated by different ad-sellers

### Multi-touch attribution
For all of the above, support for multi-touch attribution. For MVP just support for "equal credit" on the "last N touches" where "N" is a query parameter.

## Aggregate use-cases for ad-sellers

- Support for aggregates spanning many different ad buyers, broken down by cross-buyer arbitrary breakdown keys (e.g. count of all purchases across all ad buyers, broken down by country)
- Ad-buyer partitioned attribution (e.g. A purchase of a phone case on website A should NOT be attributed to an ad for car insurance paid for by website B)

# NOT in the MVP
The following use-cases are **not** a part of the MVP, but should ideally our solution should be architecturally compatible such them.
1. ML Model training
2. Offline conversions

# Human language version

## Inputs:

- A set of “source events”
  - Each originating from a person
  - Each with a timestamp
  - And having some “breakdown key” 
- A set of “trigger events”
  - Each originating from a person
  - Each with a timestamp
  - And having some “trigger value”

## Outputs:
A list of “breakdown keys”, with corresponding noisy aggregate sums of “trigger value” attributed to those "breakdown keys"

## Ideal functionality:

- For each trigger event, see if there are any source events from the same person which happened before that trigger event. 
- If there are, split the trigger value equally across the last K, adding these fractions of the trigger value to the running sums for the corresponding breakdown keys. 
- Cap each user’s total contribution (across all breakdown keys) to at most M. 
- Finally, add a small amount of random noise to the running sums to provide a differential privacy guarantee.
- Finally, return the list of breakdown keys with their corresponding noisy sums.

# Pseudocode

<pre>
function(source_events, trigger_events, last_n_touches, max_contribution, budget_query_to_consume) {
    let breakdown_running_sums = {};
    let unique_breakdown_keys = source_events.map(|s| s.breakdown_key).unique()
    for each b in unique_breakdown_keys
        breakdown_running_sums[b] = 0;
    end
    
    let user_total_contributions = {};
    let unique_people = source_events.concat(trigger_events).map(|x| x.match_key).unique()
    for each p in unique_people
        user_total_contributions[p] = 0;
    end
    
    for each t in trigger_events
        let attribution_candidates = {}
        for each s in source_events
            if s.match_key == t.match_key AND s.timestamp < t.timestamp
                attribution_candidates.add(s)
            end
        end
          
        let attributed_source_events = attribution_candidates
            .sort_descending(|x, y| x.timestamp ⇔ y.timestamp)
            .slice(1, last_n_touches)
        
        if attributed_source_events.empty()
            continue;
        end
        
        if user_total_contributions[t.match_key] + t.trigger_value <= max_contribution
            user_total_contributions[t.match_key] += t.trigger_value
            let num_touches = attributed_source_events.cardinality()
            for each a in attributed_source_events
                breakdown_running_sums[a.breakdown_key] += (t.trigger_value / num_touches)
            end
        end
    end
    
    let noise_distribution = get_noise_distribution(TARGET_EPSILON, max_contribution, budget_query_to_consume)
    for each b in unique_breakdown_keys
        breakdown_running_sums[b] += noise_distribution.sample()
    end
    
    return breakdown_running_sums;
end
</pre>

# Mathy Version

## Givens

Given n **source events**, i = 0 through i = n - 1, each comprised of three unsigned integer values:
- match key: m<sup>s</sup><sub>i</sub>
- timestamp: t<sup>s</sup><sub>i</sub>
- breakdown key: b<sub>i</sub>

Given m **trigger events**, j = 0 through j = m - 1, each comprised of three unsigned integer values:
- match key: m<sup>t</sup><sub>j</sub>
- timestamp: t<sup>t</sup><sub>j</sub>
- trigger value: v<sub>j</sub> (values validated to all lie between 1 and **value_max**)

Given an integer **K**, value between 1 and 5 (inclusive), indicating the maximum number of source events to which a particular trigger event should be attributed.

Given an integer **M**, indicating the maximum total value any particular user should be able to contribute to the output across all breakdown keys. (This value should be at least as large as the maximum trigger value)

Given an integer **P**, value between 1 and 100 (inclusive), indicating the amount of privacy budget the query should consume.

## Ideal Functionality

Let the set of **unique breakdown keys** be known as **B**

Let the set of **attribution candidates** C<sub>j</sub> be equal to the set of **source events** for which: m<sup>t</sup><sub>j</sub> == m<sup>s</sup><sub>i</sub> and t<sup>s</sup><sub>i</sub> < t<sup>t</sup><sub>j</sub> 
Let the set of **attributed source events** A<sup>s</sup><sub>j</sub> be the MIN( |C<sub>j</sub>|, **K** ) elements from C<sub>j</sub> with the largest values of t<sup>s</sup><sub>i</sub>
Let the list of **attributed breakdown keys** B<sup>j</sup> be the values of the breakdown keys from A<sup>s</sup><sub>j</sub>
Let the set of **attributed trigger events** A<sup>t</sup> be equal to the subset of tuples (m<sup>t</sup><sub>j</sub>, v<sub>j</sub>, B<sup>j</sup>) where A<sup>s</sup><sub>j</sub> is non-empty
Let **U** be the set of unique match keys **m** in A<sup>t</sup>
Let the set of **capped events** Q be a subset of elements selected from **attributed trigger events** such each match key's total contribution is at most **M**. Put another way:
Given a match key **m<sub>k</sub>** in **U** 
Let the **total contribution** **t<sub>k</sub>** of match key **m<sub>k</sub>** be equal to the sum Σv<sub>j</sub> where **m<sub>j</sub>** == **m<sub>k</sub>**, across all the tuples (m<sub>j</sub>, v<sub>j</sub>, B<sup>j</sup>) in Q
Then for all match keys **m<sub>k</sub>** in **U**, it is true that **t<sub>k</sub>** <= **M**
Let **count<sup>j</sup><sub>b</sub>** be the number of times that breakdown key **b** appears in the list B<sup>j</sup>
Let the **no noise total** total<sub>b</sub> of breakdown key **b** be equal to the sum: Σ(v<sub>j</sub>/|B<sup>j</sup>|) * **count<sup>j</sup><sub>b</sub>** across all the tuples (m<sub>j</sub>, v<sub>j</sub>, B<sup>j</sup>) in Q
Let the **noised total** noised_total<sub>b</sub> for each breakdown key **b** in B, be equal to the sum of total<sub>b</sub> + random_noise(**M**, **P**)
Return the set of tuples (**b**, noised_total<sub>b</sub>) for each breakdown key **b** in B

Please view or discuss this issue at https://github.com/patcg/private-measurement/issues/16 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 21 June 2022 18:09:54 UTC