[docs-and-reports] Private measurement of single events (#41) from Charlie Harrison via GitHub on 2023-04-09 (public-patcg@w3.org from April 2023)

From: Charlie Harrison via GitHub <sysbot+gh@w3.org>
Date: Sun, 09 Apr 2023 21:05:22 +0000
To: public-patcg@w3.org
Message-ID: <issues.opened-1660003467-1681074320-sysbot+gh@w3.org>

csharrison has just created a new issue for https://github.com/patcg/docs-and-reports:

== Private measurement of single events ==
I’m posting an issue based on a discussion that happened in https://github.com/patcg-individual-drafts/ipa/issues/60. In that issue, I proposed a new privacy mechanism satisfying differential privacy that works best in the model where individual events are queried, rather than multiple events aggregated together and queried together. While this is a setting where currently both IPA and ARA have no restrictions, we thought it best to bubble up the conversation to this group in general to discuss.

With per-event output, typically results are only useful if you aggregate them in some way after the privacy mechanism / noise is applied (similar in some sense to [local DP](https://en.wikipedia.org/wiki/Local_differential_privacy)), otherwise data is too noisy to do anything useful with it. This is also usually a huge headache because it drowns your data in much more noise than it would normally (e.g. for simple aggregates, noise standard deviation scales with O(sqrt(#users)) vs. O(1)). However, there are a few big reasons why this is useful:
- **Post-processing is “free”**: Adding noise before aggregation means that you can allow for flexible aggregation as a post-processing step, separate from the platform or any trusted server infra. You can always “requery” an already-noised piece of data as many times as you want without paying more privacy budget, because the privacy is “built into” the data. For certain deployments with many downstream use-cases that otherwise would need a lot of requerying, this can end up being a more efficient mechanism.
- **Pushes complexity out**: If the aggregation is very complicated , potentially stateful, or difficult to pull off in MPC or at-scale, then event-level output frees the report collector to be able to implement this themselves. This is particularly useful for things like model training where the “aggregation” step is potentially an entire training pipeline. In fact, the motivation for the mechanism I proposed in https://github.com/patcg-individual-drafts/ipa/issues/60 is to support recent research in private model training, where alternatives like [DP-SGD](https://arxiv.org/abs/1607.00133) are much more complex to support.

The main questions for the group are:
- Are we comfortable with per-event queries that satisfy a similar differential privacy bar as aggregates would? Is differential privacy _alone_ satisfactory for us, or do we need an auxiliary privacy notion to protect against these kinds of queries?
- If not, how should we formalize the kind of protection we want that prevents them? Are these protections realistic within our threat model (sybil attacks, etc)?

Personally, I think we should support this kind of privacy mechanism as long as it satisfies our high level differential privacy bounds, given the benefits listed above and the challenges inherent with protecting this boundary.

cc @eriktaubeneck @benjaminsavage @bmcase

Please view or discuss this issue at https://github.com/patcg/docs-and-reports/issues/41 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Sunday, 9 April 2023 21:05:25 UTC