Re: [w3ctag/design-reviews] Partial freezing of the User-Agent string (#467)

> **Lack of industry consultation**
> The HTTP protocol has become deeply embedded globally over its lifetime. As envisaged by the authors of the HTTP protocol, the User-Agent string has been used in the ensuing decades for “statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations”.

Indeed. Those are all use-cases that we intend to maintain.

> 
> The User-Agent header has been part of the web since its inception. It has been stable element of the HTTP protocol through all its versions from HTTP 1.0 in 1996 all the way to HTTP/2 in 2015 and thus has inevitably come to be relied upon, even if particular use cases are not apparent, or have have been forgotten about, or its practitioners are not participants in standards groups. The User-Agent string is also likely being used in new ways not contemplated by the original authors of the specification.
> 
> There was a salutary example of the longevity of standards in a recent Tweet from the author of Envoy, a web proxy server. He has been [forced to add elements of HTTP 1.0](https://twitter.com/mattklein123/status/1222610858427088896?s=20) to ensure it works in the real world, despite Envoy’s development starting 23 years after HTTP/1.1 was ratified and deliberately opting not to support HTTP 1.0. This is the reality of the web—legacy is forever.
> 
> Despite this reality, there is no public evidence of any attempt to consult with industry groups to understand the breath and severity of the impact of this proposed change to HTTP. It is a testament to its original design that the HTTP protocol has endured so well despite enormous changes in the internet landscape. Such designs should not be changed lightly.

The Client Hints infrastructure was thoroughly discussed at the IETF's HTTPWG, as well as [its specific application as a replacement to the User Agent string](https://youtu.be/nC0YF2Y8lBQ?t=5836).

> 
> **Issues with the stated aim of the proposal**
> The problem with the User-Agent string and the reason to propose Client Hints, per the [explainer](https://github.com/WICG/ua-client-hints#user-content-explainer-reducing-user-agent-granularity), is that “there's a lot of entropy wrapped up in the UA string” and that “this makes it an important part of fingerprinting schemes of all sorts.”
> 
> In subsequent discussions in the HTTP WG the privacy issues focused on passive fingerprinting, where the User-Agent string could potentially be used by entities for tracking users without their knowledge.
> 
> What is missing from the discussion is any concrete evidence of the extent or severity of this supposed tracking. Making changes to an open standard that has been in place for over 24 years should require a careful and transparent weighing of the benefits and costs of doing so, not the opinion of some individuals. In this case the benefits are unclear and the central argument is disputed by experts in the field. The costs on the other hand are significant. The burden of proof for making the case that this truly is a problem worth fixing clearly falls on the proposer of the change.

There's a lot of independent research on the subject. [Panopticlick](https://panopticlick.eff.org/static/browser-uniqueness.pdf) is one from the EFF.

> 
> If active tracking is the main issue that this proposal seeks to address there are far richer sources of entropy than the User-Agent string. Google themselves have published a paper on a canvas-based tracking technique that can uniquely identify 52M client types with 100% accuracy. Audio fingerprinting, time skew fingerprinting and font-list fingerprinting can be combined to give very high entropy tracking.

I'm afraid there's been some confusion. This proposal tries to address *passive* fingerprinting, by turning it into active fingerprinting that the browser can then keep track of.

> 
> **Timeline of change**
> This proposed change is proceeding more quickly than the industry can keep up with. In January 2020 alone there were some important changes made to the proposal (e.g. sending the mobileness hint by default). It is difficult to fully consider the proposal and understand its impact until it is stable for a while. The community needs time to 1) notice the proposal and 2) consider its impact. There has not been enough time.
> 
> Move fast and break things is not the correct approach for making changes to an open standard.

Regarding timelines, I [updated](https://groups.google.com/a/chromium.org/d/msg/blink-dev/-2JIRNMWJ7s/nBZ0sVI2BgAJ) the intent thread.

> 
> **Narrow review group**
> It’s difficult to be objective about this but the group discussing this proposal feels narrow and mostly comes from the web browser constituency, where the change would initially be enacted, but the impact not necessarily felt. It would be good to see more people from the following constituencies in the discussion:
> 
> * advertisers
> * web analytics
> * HTTP servers
> * load balancers
> * CDNs
> * web caches

The latter 4 are active at the IETF and at the HTTPWG. We've also received a lot of feedback from others on the [UA-CH repo](https://github.com/WICG/ua-client-hints).

> 
> All of these constituencies make use of the User-Agent string and must be involved in the discussion for a meaningful consensus to be reached.
> 
> Obviously you can’t force people to people contribute but my sense is that this proposal is not widely known about amongst these impacted parties.
> 
> **Diversity of web monitisation**
> Ads are the micropayments system of the web. Nobody likes them but they serve a crucial role in the web ecosystem.
> 
> The proposed change hurts web diversity by disproportionally harming smaller advertising networks that use the OpenRTB protocol. This essentially means most networks outside of Google and Facebook. Why? The User-Agent string is part of the [OpenRTB BidRequest object](https://www.iab.com/wp-content/uploads/2015/05/OpenRTB_API_Specification_Version_2_3_1.pdf) where it is used to help inform bidding decisions, format ads and targeting. 

A few points:
* Existence of data in the OpenRTB BidRequest object doesn't mean that users and their user agents are obligated to provide it to advertisers. For example, I also see Geolocation in that same object as "recommended". I'm assuming you don't think that browsers should passively provide geolocation data on every request.
* The implications of anti-fingerprinting work on OpenRTB are [being discussed](https://github.com/w3c/web-advertising/blob/master/rtb-use-case.md) at the W3C’s Web Advertising Business Group.
* The user agent information would **still be available to advertisers**, they'd just have to actively ask for it (using UA-CH or the equivalent JS API) in ways that enable browsers to know which origins are gathering that data.
* Once something like Privacy Budget is in place, the change of UA data to active fingerprinting would enable advertisers to “spend” their entropy bits where they need them, rather than “pay” for entropy bits they potentially don’t need.


> Why does it hurt Google less? Because Google is able to maintain a richer set of user data across its dominant web properties ([90% market share in search](https://ie.oberlo.com/statistics/search-engine-market-share)), Chrome browser ([69% market share](https://www.statista.com/statistics/544400/market-share-of-internet-browsers-desktop/)) and Android operating system ([74% market share](https://gs.statcounter.com/os-market-share/mobile/worldwide)).
> 
> The web needs diversity of monetisation just as much as it needs diversity in browsers.
> 
> **Dismissive tone in discussions**
> Some of the commentary from the proposers has been dismissive in nature e.g. the following comments on the [Intent to Deprecate and Freeze: The User-Agent string post](https://groups.google.com/a/chromium.org/forum/#!msg/blink-dev/-2JIRNMWJ7s/sFR8F1LLDgAJ) in response to a set of questions:
> 
> * “I’d expect analytics tools to adapt to this change.”
> * “CDNs will have to adapt as well.“
> 
> Entire constituencies of the web should not be dismissed out of hand. This tone has no place in standards setting.


I apologize if this came across as dismissive. That wasn't my intention.

> 
> **Entangling Chrome releases with an open standards process**
> In the review request, Chrome release dates are mentioned. It doesn’t feel appropriate to link a commercial organisation’s internal dates to a proposed standard. There are mentions of shipping code and the Chrome intent.

The TAG review process asks for relevant time constraints. I provided them.

> 
> **Overstated support**
> This point has been made by others here but it is worth restating. It feels like there is an attempt to make this proposal sound as if it has broader support than it really does, in particular on the [Chrome intent](https://groups.google.com/a/chromium.org/forum/m/#!msg/blink-dev/-2JIRNMWJ7s/yHe4tQNLCgAJ), linked explicitly by the requester.
> 
> **Unresolved issues**
> The review states “Major unresolved issues with or opposition to this specification: “ i.e. no unresolved issues or opposition. This is true only if you consider unilaterally closed issues to be truly closed. Here are a couple of issues that were closed rather abruptly, and coinciding with a Chrome intent.
> 
> Some closed HTTPWG issues:
> 
> * [Client-Hints exposes fingerprint values to additional parties and logging sensitive locations](https://github.com/httpwg/http-extensions/issues/767)
> * [Data motivating CH?](https://github.com/httpwg/http-extensions/issues/768)
> * [httpwg/http-extensions#786](https://github.com/httpwg/http-extensions/issues/786)

I'm not sure what your point is here. These issues were raised (one by me), discussed, resolved and then closed.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/467#issuecomment-586884048

Received on Monday, 17 February 2020 08:58:38 UTC