Re: [docs-and-reports] Principle: Don't use Entropy (#4) from Charlie Harrison via GitHub on 2022-06-17 (public-patcg@w3.org from June 2022)

From: Charlie Harrison via GitHub <sysbot+gh@w3.org>
Date: Fri, 17 Jun 2022 03:41:46 +0000
To: public-patcg@w3.org
Message-ID: <issue_comment.created-1158458315-1655437304-sysbot+gh@w3.org>

> You might also interpret this as a bit of a repudiation of your notion of "at-scale privacy". Sure, there is something to be said for providing protection to people in larger groups and using aggregate measures as the basis for making claims about those groups. However, I'm asserting that anything you say about a population as a whole is not worth much if it means that there are individuals or minorities that might suffer disproportionately as a result.

I didn't say we'd want to use an "at scale" measure alone. There is value to using _both_ worst case metrics like differential privacy, and at-scale metrics (like information gain, entropy, etc) which try to measure something like “scaled abuse is infeasible”. 

If there were a privacy change that helped reduce at-scale privacy loss but kept the worst-case equal, I think we ought to consider making that change in a PATCG proposal. 

A few good properties of at-scale metrics:
- They help represent whether _pervasive_ tracking on the web can occur (e.g. many people tracked at once).
- They can help us understand and shape the economics of tracking behaviors. For instance,  if it is only possible to track a small number of users, deploying tracking methods changes the cost/benefit calculus. Note that this can benefit all users by creating an environment that disincentivizes attacks that are useful only if successful on a large scale.

Worst case notions of privacy are ill-suited for evaluating such benefits as by definition, they ignore the effect on the typical use case and only focus on the extreme cases. I tend to think of worst-case metrics as a sort of “back stop”. It can’t get worse than this, but it can definitely be better if you consider less adversarial scenarios. We should improve both.

> The sorts of things that might be useful is knowing how often someone is identifiable as being in a group of size less than k for k∈{1,10,100,…} for example. Or how many people might reveal information that exceeds some threshold h. That might not suffice, in that those sorts of statistics depend on assumptions about populations and selection, but they do provide more direct insight into the effect on privacy.

I have concerns with these kinds of k-anon style metrics, but perhaps we should hash it out in a different issue on what the privacy measure _should_ be :) and focus this one on the entropy / at-scale debate.

-- 
GitHub Notification of comment by csharrison
Please view or discuss this issue at https://github.com/patcg/docs-and-reports/issues/4#issuecomment-1158458315 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 17 June 2022 03:41:48 UTC