Re: [w3ctag/design-reviews] User-Agent Client Hints & UA Reduction (#640) from Mike Taylor on 2021-08-10 (public-webapps-github@w3.org from August 2021)

From: Mike Taylor <notifications@github.com>
Date: Tue, 10 Aug 2021 08:01:52 -0700
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/640/896105391@github.com>
Hi everyone, lots to reply to. I will try my best.

@torgo wrote:
> One question that came up : currently browsers lie in the UA string because web sites make assumptions. What tells you that the CH approach will compel UAs to tell the truth more often or will somehow disincentivize them from lying?

UA-CH wasn't design to prevent _browsers_ from lying, and I don't think any technology can account for the many constraints (technical or otherwise) that cause developers to gate experiences against allow- or block-lists for browsers (as much as we might disagree with these decisions).

To recap, UA-CH is attempting to solve 2 problems: 1) reduce passive entropy sent over the network, and 2) break free from the legacy hot mess that is the UA string by providing an actual API (HTTP and JS). However, part of 2 tries to account for the reality that sometimes browsers need to lie to get things to work. Revisiting https://wicg.github.io/ua-client-hints/#grease is probably worthwhile.

So I think the framing is backwards a bit - with brand lists in Sec-CH-UA, we're trying to disincentivize _sites_ from creating inflexible allow- or block-lists, or sniffing for a single ossified string. In a way, this allows for UAs to both lie and tell the truth without punishing users, e.g., you could imagine a site that is sniffing for "Chromium" in the brand list - Firefox could send:

`sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Firefox";v="99", "Gecko"; v="99"`

That should unbreak the site, and Firefox usage doesn't disappear from the analytics of a given site, with all the expected negative consequences (which is a trade-off that needs to be considered when doing UA spoofing today). Similarly, some WebKit port could always send "Safari" in addition to "WebKitGTK" to avoid being told it wasn't supported on certain sites (or whatever the right values would be, I admit I don't know a lot about the WebKit embedder ecosystem).

Sec-CH-UA doesn't prevent sites from being inflexible - there's plenty of ways to accomplish that. But hopefully it provides useful flexibility to browsers to benefit their users.

I probably should have read @mcatanzaro's reply before writing the above, because I think we're in agreement. Particularly here:

> I'm not sure this is necessarily a bad thing, though. I don't see CH as a way to avoid lying. CH would be a more structured alternative to the UA string, to encourage web developers to avoid accidentally breaking things by misparsing the UA string. Reducing accidental breakage seems good for me. But there will always be websites that decide they do not want to allow access to FreeBSD users or s390x users (yes, these run web browsers!) or whatever, and that means we'll always need to lie.

@mcatanzaro also wrote:

> What would really incentivize implementation would be to combine CH with some very aggressive plan to make it harder to detect what browser is in use, e.g. remove the Sec-CH-UA hint from the spec, then pair CH with a plan to pare down existing UA strings on all browsers, even if it temporarily breaks some websites. Think "remove the string 'Chrome' from the user agent" level of change. Then I could get excited about CH. That would be a pretty huge change though, and I doubt it's in the cards.

This reminds me of a conversation I had with @yoavweiss at TPAC when I worked for Mozilla, and he proposed all browsers sending the same frozen UA string. It's certainly an interesting thought, but also a little bit scary when you consider what might break for folks to get there, especially in the long-tail of the (unmaintained) web. I also wonder if this pushes developers to get more creative in their fingerprinting techniques to work around engine bugs, or sniffing for platform features where the feature detection story isn't great. Something to think about (and a great reason to improve the feature detection story for the platform).

@marcoscaceres wrote:

> I see Mozilla's position on this remains marked as "harmful"

And then quotes part of the rationale:

> In addition to not including this information [Sec-CH-UA], we would prefer freezing the User Agent string and only providing limited information via the proposed NavigatorUAData interface JS APIs. This would also allow us to audit the callers. At this time, freezing the User Agent string without any client hints (which is not this proposal) seems worth prototyping. We look forward to learning from other vendors who implement the "GREASE-like UA Strings" proposal and its effects on site compatibility.

Fun fact, I helped to write that rationale back when Mozilla considered UA-CH "non-harmful", jointly with :mt, IIRC. Not sure if they meant to keep that original language, or if it was left in unintentionally (the "[w]e look forward to learning from other vendors" part in particular doesn't quite jive with "harmful"). It may be useful for someone from Mozilla to make sure it reflects their current thinking. (I'll also note that auditing origins that request client hints via `Accept-CH` is also possible at the HTTP level, not just the JS interface.)

> @miketaylr, @yoavweiss, as the folks proposing this, are you willing to reduce the design to align more closely with what WebKit and Mozilla are requesting (maybe drop Sec-CH-UA entirely, work together to freeze the UA string, and maybe just expose the JS API)?

I'm willing to consider a lot of things, especially if there is genuine implementer interest. :) The most useful thing for folks proposing these kinds of changes would be to [file issues on the repo](https://github.com/WICG/ua-client-hints/issues/new). I think getting rid of the HTTP component of UA-CH wouldn't be an improvement, but that's a conversation that can be had in a relevant spec issue.

@hsivonen wrote:

> Currently, conditional exposure of low vs. high entropy UA Client Hints is being proposed. Apart from the structuredness concern, what would be worse if the same mechanism controlled low vs. high entropy User-Agent value instead?

Are you suggesting a default low-entropy UA string (this is Chrome's current thinking for the Reduced UA string), and a client hint to opt into a high-entropy UA string? Do you think a site should get _all the entropy_, even if it might just need platform version, for example?

In a world where something like Privacy Budget existed, if I have a given entropy budget I would want to only request exactly what I needed to avoid going over. If my only choice is "low" vs "high" entropy UA string with more info than I need, that perhaps create a challenge for me.

The decomposed UA client hints allow me to be granular. For example, I can ask for device model if that's the only thing I need to verify a mobile browser is capable of handling a device-specific Android (feels a bit contrived, but consider the [Firefox Mobile UA override for the Samsung Galaxy store](https://github.com/mozilla-extensions/webcompat-addon/blob/main/src/data/ua_overrides.js#L339-L372)).

(I might misunderstand your question or thinking, feel free to correct me).


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/640#issuecomment-896105391
Received on Tuesday, 10 August 2021 15:02:05 UTC