revisiting User-Agent

Hi public-privacy folks,

The presence and entropy of the User-Agent header and its impact on browser fingerprinting and user privacy have fluctuated over time, but there seems to be renewed interest in considering limits on the detail in this header or how widely it’s distributed.

Mike West has proposed freezing current User-Agent headers and subsequently exposing UA details more granularly and only in response to server requests via Client Hints.

TAG discussion issue: https://github.com/w3ctag/design-reviews/issues/320 <https://github.com/w3ctag/design-reviews/issues/320>
I-D: https://tools.ietf.org/html/draft-west-ua-client-hints-00 <https://tools.ietf.org/html/draft-west-ua-client-hints-00>

(Mike, feel free to correct or add detail to my very brief recounting of your proposals.)

The Safari team has also publicly noted an interest in limiting User-Agent details or freezing the header altogether. (I’m not sure if there’s clear documentation on exactly where this functionality has ended up.)

At the same time, research has shown some increases in the entropy and available information being pushed into User-Agent headers by, for example, embedded browsers in ways that have caused a rollback in fingerprinting mitigations.
https://hal.inria.fr/hal-01285470v2/ <https://hal.inria.fr/hal-01285470v2/>


While it seems unlikely that we will get to a place where a web site, especially with client-side functionality, will be unable to determine what brand or major version of a browser is in use even without a User-Agent header, I think there is room to minimize the passive fingerprintability of that information.

In particular, sub-requests both often need less of this information and where a sub-request is just providing static content (an image, say), it’s entirely possible that it wouldn’t be able to infer all the same information from other sources. The proposed architecture for Client Hints is requiring an opt-in from the server, so even if servers can get all the same information by opting in to headers, narrowing down which sites do this improves the visibility of where this surface might be used for fingerprinting, which can help researchers, policymakers and others to mitigate the privacy impacts of browser fingerprinting. Providing separate fields would allow for actual data minimization behavior by servers that want to follow that best practice: server operators who believe they only need brand and major version for statistical and debugging purposes can ask just for those fields, rather than everything.

I think we should be careful in noting which details need to be made available in these Client Hints for those debugging or analytics purposes. Are CPU architecture or device hardware information necessary, or do they only encourage “overfitting” to certain devices, which has long been a problem with UA sniffing in general? Is there anything we can do proactively to prevent recreation of all the detail and duplication present in current User-Agent headers in the corresponding Client Hint headers?

There’s also a draft applying the same suggestion to language headers for content negotiation. I think there are also privacy benefits here, although I’m not entirely clear on whether `Lang` is getting any significant use at the moment or whether we have any reason to believe that Sec-CH-Lang would get more substantial uptake for content negotiation.
https://tools.ietf.org/html/draft-west-lang-client-hint-00 <https://tools.ietf.org/html/draft-west-lang-client-hint-00>

Feedback on these proposals would be welcome, but we can also use discussion on this list to consider where we think privacy improvements to User-Agent can be made more generally. New implementer interest in this area is something to capitalize on.

Thanks,
Nick

Received on Friday, 14 December 2018 00:35:19 UTC