Notes on two recent fingerprinting papers

Hi all,

In working on revisions for our guidance on mitigating fingerprinting, I've been reviewing two recently published papers on detecting browser fingerprinting and browser fingerprintability in the wild. Below are my notes and conclusions from those papers. I would welcome your comments as well.

I believe these results show cause for optimism regarding 1) potential for decreasing fingerprintability and 2) potential for detecting fingerprinting, but they also show that fingerprinting is feasible and active fingerprinting continues to happen, successfully, on the Web today.

Thanks,
Nick

### Beauty and the Beast

Pierre Laperdrix, Walter Rudametkin, Benoit Baudry. Beauty and the Beast: Diverting modern web browsers to build unique browser fingerprints. 37th IEEE Symposium on Security and Privacy (S&P 2016), May 2016, San Jose, United States.<http://www.ieee-security.org/TC/SP2016/>. <hal-01285470v2>
https://hal.inria.fr/hal-01285470v2/ <https://hal.inria.fr/hal-01285470v2/>

Reports results from a data collection project at AmIUnique.org, essentially a follow-on and comparison to the Panopticlick paper that initiated much of this work. While we don't have a sense of the representativeness of their large sample (since people who volunteered to participate in the test are likely to be unusual in terms of browsing behavior and privacy expectations), the statistics on entropy and anonymity set size can be useful in determining what kind of characteristics are contributing most to fingerprintability.

Many of the most distinctive characteristics from the Panopticlick dataset have less entropy today. The lack of support for listing of plugins in recent versions of Chrome and on mobile devices and the decreased availability of Flash (and as a result, the list of available fonts) substantially decrease the entropy of those characteristics. This is documenting the expected level of improvement on those characteristics.

Canvas fingerprinting provides lots of distinctiveness, including via probing presence of fonts/particular emoji representations.

WEBGL_debug_renderer_info provides hardware-level information that's very discriminating, although only by default in the Chrome browser? This looks to me like a straightforward bug: debug info at this level of detail presumably doesn't provide important end-user functionality. Nonetheless, we might want to note that debug information shouldn't be generally accessible.

The User-Agent string is not especially distinctive on desktop browsers, but *is* very distinctive on smartphones, in ways that apparently are unnecessary for functionality. This seems like an unnecessary step backwards. Researchers find firmware version, cellular service carrier, hardware model and in-app-browser information in these User Agent strings. While we have typically avoided suggesting standardization of User-Agent strings because OS and browser vendor will be observable anyway, it would seem to make a big difference on passive fingerprinting if these extra variations weren't introduced.

Finally, this paper considers how different scenarios for future of Web technology would change the fingerprintability results. In particular, removal of Flash, removal of plugins and better standardization of HTTP headers all show decreases in fingerprintability in different contexts. If all of those changes were made, the results are larger (that is, these wouldn't be redundant) -- desktop browser uniqueness in this dataset, for example, would decrease from 90% to 54%.

### OpenWPM

Steven Englehardt, Arvind Narayanan. Online tracking: A 1-million-site measurement and analysis.
https://webtransparency.cs.princeton.edu/webcensus/

Greg mentioned a draft of this paper earlier on the public-privacy list: https://lists.w3.org/Archives/Public/public-privacy/2016AprJun/0088.html

Some conclusions:

Many sites/trackers that were using canvas fingerprinting have stopped, apparently in response to public knowledge that such techniques were in use. That suggests that detectability of fingerprinting can be used to inhibit fingerprinting, even without formal regulatory intervention.

The error rate, specifically false positives where an activity looks like fingerprinting but is actually just functionality, can be made extremely low. Automated detection of fingerprinting then, is likely to be a useful signal. (Of course, it's hard to measure when fingerprinting is going on but isn't detected through these automated means; e.g. passive fingerprinting via p0f.)

Because browser fingerprinting in the wild typically combines many features together, it becomes feasible to discover new methods being used for fingerprinting. In this paper, authors see AudioContext, WebRTC and Canvas measureText in use for fingerprinting purposes: https://webtransparency.cs.princeton.edu/webcensus/#fp-results

I think AudioContext may be an issue that we hadn't already flagged and should follow up on. For the others, this provides some data on their use in the wild.

Received on Tuesday, 5 July 2016 21:03:57 UTC