Re: [meetings] Agenda Request – On-device DP budgeting (#166) from Roxana Geambasu via GitHub on 2024-03-04 (public-patcg@w3.org from March 2024)

From: Roxana Geambasu via GitHub <sysbot+gh@w3.org>
Date: Mon, 04 Mar 2024 23:06:35 +0000
To: public-patcg@w3.org
Message-ID: <issue_comment.created-1977624115-1709593592-sysbot+gh@w3.org>
Hi all,

Thank you for the discussion and for your feedback on the [presentation](https://docs.google.com/presentation/d/1bkZdqEiwXTLCy0TfbeBgJL7mccKBGuy2ws_vwpdLlbY/edit#slide=id.p) about an individual DP (IDP) formulation for on-device budgeting systems.   Your feedback will help us improve our communication of these concepts for future interactions.

I wanted to give a more in-depth response to some of the questions that arose in the Thursday session, which might help clarify some of the high-level points we were making there.

# How Summary ARA fits with respect to IDP

From a cursory look at the **documentation** of Summary ARA (**not the implementation!**), it looks that Summary ARA may indeed be operating in an IDP setting and leveraging individual sensitivity to avoid deducting budget for triggers with no source. This is actually one of the optimizations that I mentioned in my talk, which in my opinion, can only be justified through an IDP formulation. This optimization is fine from a guarantee perspective, as long as available privacy budgets are kept private. In one reporting strategy of ARA ([trigger_context_id-based reporting](https://github.com/WICG/attribution-reporting-api/blob/main/AGGREGATE.md#optional-reduce-report-delay-with-trigger-context-id)), this seems to be the case, because reports are sent unconditionally (so even when there is no attribution and even when out of budget).  In a second reporting strategy of ARA, involving randomized reporting [delays](https://github.com/WICG/attribution-reporting-api/blob/main/AGGREGATE.md#hide-the-true-number-of-attribution-reports) and [dummy reports](https://github.com/WICG/attribution-reporting-api/blob/main/AGGREGATE.md#hide-the-true-number-of-attribution-reports), further privacy analysis may provide some justification, but it is unclear to me.

Regardless, in my opinion, having a proper theoretical formulation of the DP desideratum would be very useful, in order to obtain a more systematic view on what mechanisms are needed to achieve the desideratum, what their requirements are, and what are good starting points for designing these mechanisms based on existing literature.  As I’ve argued, I think IDP is a great fit for formulating the DP desideratum for this kind of system.  First, it tells you precisely what’s happening when you want to deduct less privacy on some devices based on the data they have (or don’t have): you are using individual sensitivity to compute a data-dependent privacy loss; that’s it, no "oddity" here, everything is well defined mathematically.  Second, it tells you what the requirements are if you ever want to do that, including that you must hide the remaining privacy budgets.  Third, it tells you that this mode of operation with "hidden privacy budgets" will be pervasive in your system, so there is a huge need for additional mechanisms to be built into the system so advertisers/publishers can deal with the pervasive default-value reporting.  It also helps you identify mechanisms from prior literature as starting points to tackle this issue.  Finally, it helps you tap into other opportunities for optimization, which you might not even think to incorporate off-hand, b/c they are perhaps not as intuitive as the idea to not deduct budget when there is no relevant data.  I described some other examples of optimization opportunities based on individual sensitivity in my talk. 

# Relationship between IDP and DP

One can always take the supremum of the IDP big-epsilon function to derive a DP guarantee from an IDP one (see my slides or Proposition 3.2 in the [POPL’15 paper](https://www.cse.chalmers.se/~gersch/popl2015.pdf)). So, in answer to @csharrison’s question on whether Summary ARA (at least the version that always sends a report) could still claim it satisfies traditional DP, the answer is YES, but the question is ***for what epsilon***?  If you are willing to assume that there is a reasonable maximum across the individual global budgets enforced by all devices participating in ARA (those ε<sub>G</sub><sup>i</sup> ’s from my talk), then you can take that maximum (call it ε<sub>G</sub>) and you’ve got yourself an ε<sub>G</sub>-DP guarantee!  The problem with that is that I don’t know how you can assume some *reasonable* maximum in an open system such as ARA (or PAM for that matter)… What happens if the user on device j configures ARA to, say, ε<sub>G</sub><sup>j</sup>=100000?  Does that mean that the ARA system is now 100000-DP?  That doesn’t sound right: surely, devices with more reasonable settings of ε<sub>G</sub><sup>i</sup>  are better protected than that!  This is intuitive, but then again, *how do you express that with a system-wide DP guarantee*?  Maybe one option is to just settle on formulating a guarantee only for those users who do not change the fixed ε<sub>G</sub> value you encode in ARA. You exclude all the other users and refuse to make a privacy claim for them; but for the users with unchanged ε<sub>G</sub>, you claim an ε<sub>G</sub>-DP system-wide guarantee.  That would be valid.  But what if there is a different version of ARA running on some devices, with a slightly higher default value ε<sub>G2</sub> (maybe an older release of ARA)?  Does this mean that those users don’t get any protection?  Of course not!  But they certainly don’t get the ε<sub>G</sub>-DP protection you are now advertising for the system.  If you go to finer and finer granularity, you realize that what you really would like to be able to say is that *each device i gets the DP protection that their own ε<sub>G</sub><sup>i</sup> configuration affords.*  ← **That’s IDP.**

IDP says that, from the perspective of device i, the individual DP guarantee is ε<sub>G</sub><sup>i</sup>, *regardless of what the other devices’ guarantees are*.  This is what @bmcase referred to when he mentioned that IDP provides "isolation between users."  And this is why I said that I don’t believe it’s meaningful in ARA-/PAM-type systems to only internally operate under IDP but then externally always translate to a single, system-wide DP claim.   In this setting, where privacy budgets are managed by the user devices themselves, I really think you want the power to tell each user, individually, what guarantee they get, based on the configuration on their device (which they could even adjust based on their privacy comfort levels!).   There’s something beautiful and powerful in being able to say that to users, and IDP lets you do that cleanly, with well-defined mathematical formalism backing your claims.

As a parenthesis, I wanted to also point out that IDP has recently become popular in **purely centralized settings** (e.g., [NeurIPS’21](https://arxiv.org/pdf/2008.11193.pdf)), where both the DP query executor and the DP budget management happen in a centralized trusted party.  There, you can reasonably think of IDP as just an **internal mechanism** (to optimize privacy budget consumption), but with a very clean translation from IDP to DP.  Because the centralized party does all the budget management, it can reliably enforce a single, reasonable-valued, global budget (call it ε<sub>G</sub>) on each individual record.  This means that the system enforces big-epsilon-IDP with big-epsilon(i)=ε<sub>G</sub>, which translates cleanly into an ε<sub>G</sub>-DP guarantee.  As an example, if IPA were interested in doing IDP internally (for optimization of budgets, same as the above paper), then it could do the translation to DP very cleanly!   But for on-device systems, I am pressed to see how we would make that translation to a system-wide ε<sub>G</sub> with a reasonable value (and also *why* we would want to do that, as it seems what we would want is actually what IDP gives us…).

# Terminology and Communication of the Guarantee to the Users

DP is notoriously difficult to explain to regular users, but we’re starting to find ways to do so.  IDP is much newer, so we really haven’t even started exploring communication to a wider audience.  This is why, for example, the group privacy property of IDP is known to hold (it is implied from the much more general Lemma 3.3. of the [POPL’15 paper](https://www.cse.chalmers.se/~gersch/popl2015.pdf)), but it’s hard for a non-expert to see that.  Moreover, the fragmented and inconsistent naming only adds to the communication challenges right now.  In my experience, this fragmentation is not unusual in academia while concepts are being developed.  I’ve seen this happen before, including in the space of ML attacks a few years ago.  Initially, there are lots of names, different definitions, different problem formulations, different goals, etc. Then, one or two papers come along with a more mature/synthesized look at the problem and set a more consistent framework upon which further literature then builds.  This convergence hasn’t happened yet for IDP, and the way I see it, this gives us an opportunity to establish that – provided that we agree that, from a technical perspective, IDP is the right definitional framework for these on-device systems, to both offer users meaningful privacy guarantees and enable systems to manage budgets efficiently.

@bmcase and I, along with a team of researchers at Columbia University and University of British Columbia, are planning to formalize our proposal of IDP-based on-device budgeting in the coming months, and to submit an academic paper for peer review.  In that paper, we intend to take on the challenge of both articulating IDP and its properties for a wider audience and motivating why it is a good fit for on-device budgeting systems.

-- 
GitHub Notification of comment by roxanageambasu
Please view or discuss this issue at https://github.com/patcg/meetings/issues/166#issuecomment-1977624115 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Monday, 4 March 2024 23:06:36 UTC