Re: [meetings] Agenda Request - Should PATCG be opinionated on which technologies are used to enable privacy? (#39)

> Is there consensus in this group on outputs that are private? (Dependant on consensus on a working definition of privacy.)

My understanding is that there is a qualitative difference between *privacy* and *confidentiality*.

*Privacy* concerns information loss (or equivalently privacy loss), so it's really a property of an algorithm. For example, a function that takes a list of salaries and outputs the average can be argued to be somewhat private, because it reveals some but not all information about the input. A function that takes the input and returns noise is completely private (and useless). And a select * function is not private at all. The widely accepted formalism to reason about privacy is *differential privacy* which has various (actually too many) definitions, but basically tries to bound how much information about the input one can reconstruct from the algorithm's (or "*mechanism's*") outputs.

*Confidentiality* is a property of the data: a piece of data is confidential if only authorized parties have access to it. So e.g. my browser traffic to https://github.com is confidential, because only I and the github server have access to it. Whereas if I browsed a plain http:// website, the traffic wouldn't be confidential, because intermediate network nodes can access the data.

Generally speaking, confidentiality is well-understood and fairly "easy", privacy on the other hand is very very difficult. This is because privacy is *stateful*, if you reveal the average salary and independently the median salary, then the information loss compounds, and it's very difficult to track this properly. One of the nice properties of differential privacy is that the measure of information loss is additive, but this still makes it very hard to actually implement a privacy-tracking system (you need to introduce some notion of a "privacy quota"/"privacy budget").

With the computations I have worked with, we basically sidestepped the issue of measuring privacy loss by saying that all parties need to agree on the computation that is about to take place on their joined data. The downside of this is that there is no flexibility around the algorithm after the provisioning of the data. Whereas if we had a privacy-tracking system, then one could provision their data with the knowledge that no matter what computation is run against it, the privacy loss is limited. But again, such a system is very difficult to implement in practice.

-- 
GitHub Notification of comment by exFalso
Please view or discuss this issue at https://github.com/patcg/meetings/issues/39#issuecomment-1086395328 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 1 April 2022 22:52:46 UTC