Re: [private-measurement] Strawman: Target functionality for MVP (#16) from Charlie Harrison via GitHub on 2022-06-22 (public-patcg@w3.org from June 2022)

From: Charlie Harrison via GitHub <sysbot+gh@w3.org>
Date: Wed, 22 Jun 2022 15:51:23 +0000
To: public-patcg@w3.org
Message-ID: <issue_comment.created-1163291805-1655913081-sysbot+gh@w3.org>

Thanks @benjaminsavage for the strawman. I want to poke at one aspect which is multiple breakdown keys per source event. This is listed as ideal functionality but isn't described in the detailed code / math. I would vote to add this. My goal is to try to achieve some level of sensitivity management that I discussed in the last meeting in the MVP. This will improve the utility of the system as a whole and unlock new use-cases. With multiple breakdown keys per source this is achievable with some changes to the underlying algorithm:

- Multiple queries to the system are configured by appending multiple breakdown keys to each source (for simplicity, assume every source has exactly `q >= 1` breakdown keys)
- Amend the pseudocode function to take an index `q_i`, and edit the algorithm to just consider the `q_i`th breakdown key
- Run the main algorithm `q` times, with some custom noise distribution function based on `q`

Obviously, the above can be optimized to avoid `q` joins, but that's the overall idea. What this unlocks is the ability for us to take advantage of advanced DP composition, so multi-query settings can receive much smaller effective noise. How this works is by enforcing additional sensitivity constraints on the system, by varying `q` and the `max_contribution` parameter. The Linf sensitivity of the entire (multi-)query is equal to `max_contribution` and the L1 sensitivity is equal to `q * max_contribution`, so the L2 sensitivity is equal to `max_contribution * sqrt(q)`

Note: there are other ways of achieving this goal, but I think this serves as a flexible basis, and is probably the simplest without making the core algorithm more complicated in how it does contribution capping. We can do more advanced things by varying `max_contribution` per sub-query, and the noise scale per query without impacting the core algorithm too. We could also constrain individual subqueries more by defining a `max_contribution_per_breakdown_key` param to the core algorithm, which would improve utility if the ad-tech expects user contributions across many keys in one subquery. That seems less generally useful to me although I'm sure there are some use-cases where it works well.

I also want to make sure that privacy unit / grain discussions happen on the other issue and not here, despite it being embedded in the algorithm here.

--
GitHub Notification of comment by csharrison
Please view or discuss this issue at https://github.com/patcg/private-measurement/issues/16#issuecomment-1163291805 using your GitHub account

--
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 22 June 2022 15:51:25 UTC