W3C home > Mailing lists > Public > public-privacy@w3.org > July to September 2015

Re: Results from Privacy review of Presentations API using Privacy Questionaire. (Wall of text warning!)

From: Joseph Lorenzo Hall <joe@cdt.org>
Date: Fri, 18 Sep 2015 16:02:13 -0400
To: David Singer <singer@apple.com>
Cc: Christine Runnegar <runnegar@isoc.org>, "public-privacy (W3C mailing list)" <public-privacy@w3.org>
On Fri, Sep 18, 2015 at 3:37 PM, David Singer <singer@apple.com> wrote:
>>>              • Does the data record contain elements that would enable re-correlation when combined with other datasets through the property of intersection?
>>>                      • No (just audio/video)
>>> This seems like a hard question... on the one hand, if a "face" is enough from which to derive a facial pattern that you can correlate with other databases of facial patterns, then the answer would seem to be yes (although I don't know of any public databases of facial biometrics). Maybe there's a better way to get at what this question wants to get at? Does anyone remember what the impetus for this question is? or can we think of examples in a spec that we'd definitely want to catch with this question?
> Yes.  Isn’t this getting at the problem that if I know someone’s gender, birthday, zip-code and one other datum (I forget what), I can almost certainly identify them, even though any one of these looks innocuous?

The US-based privacy law academic community would refer to this as
"the mosiac theory" (that any one datum is not revealing but in
concert more than one can be quite revealing). Sweeney [1] published
the first analysis that revealed the power of the gender, birthday,
and zip code three-tuple using 1990 US census data, although Golle [2]
updated that and found that in ten years it had become less uniquely
identifying (presumably because of increased urban concentration of
the US population?). I don't think anyone has repeated the analysis
with 2010 US census data, which would be cool.

But I digress!

I'm having a hard time thinking of ways to make this particular
question easier for a spec-author to handle (and for us to evaluate)
without potentially loosing important privacy thinking we or they
should do in the process. Hmmmm...

best, Joe

[1]: http://dataprivacylab.org/projects/identifiability/paper1.pdf
[2]: https://crypto.stanford.edu/~pgolle/papers/census.pdf

