Re: `Accept-CH` header is weird

On Apr 13, 2018, at 5:17 PM, Alex Russell <slightlyoff@google.com> wrote:
> On Fri, Apr 13, 2018 at 1:02 PM, Nick Doty <npdoty@ischool.berkeley.edu <mailto:npdoty@ischool.berkeley.edu>> wrote:
> On Apr 13, 2018, at 9:07 AM, Alex Russell <slightlyoff@google.com <mailto:slightlyoff@google.com>> wrote:
> >
> > Hi Nick, Eric:
> >
> > Thanks for the thoughtful comments.
> >
> > The analysis about opt-in free CH comes from turning the question around slightly: if any origin can opt-in for subsequent top-level navigations, and browsers wish not to provide that information to those origins (selectively, or as a policy), isn't the recourse identical to the recourse in the non-opt-in scenario?
> >
> > I.e., browsers will need to filter the list (perhaps to zero) to maintain whatever privacy invariants they wish to maintain. To be effective, this would also need to be paired with removing/modifying the runtime behavior of web content (turning off media query support, blocking various DOM APIs, disabling scripting, etc.).
> 
> As a reminder, this doesn't apply just to top-level navigations, but also to embedded content, which might be loading of an image that doesn't have access to DOM APIs, JavaScript, or CSS.
> 
> Understood, and this gets to a part of the design I'm somewhat uneasy about: it seems more risky for CH to be sent for x-origin sub-resources than for top-level navigations. I'd even support the opt-in being required to trigger CH header sending for sub-resources, whereas the argument for setting it on the top-level resource feels flimsy.

There is an additional risk in sending Client Hints for cross-origin resources and for resources that don't have active content, because there may be cases where entirely new browser fingerprinting capability is introduced.

However, even in the case of top-level resources (HTML, with JavaScript) making browser fingerprinting passive where it was previously active is a significant change, one that undermines existing mitigations. I'm not sure why that strikes you as flimsy.

> > Similarly, if browsers want to know what's being exfiltrated, they can simply not support CH (always send a null list).  If visibility of script/active measures is the goal, this seems like the only way to achieve it.
> >
> > The opt-in-CH-is-good position seems untenable as a result.
> 
> I think this is the distinction between limiting fingerprinting surface and making fingerprinting detectable.
> 
> I agree that browsers that wish not to expose this fingerprinting surface or to limit disclosure of this information to sites will not send the client hint in any case, whether there's an opt-in request from the server or not. But we're also especially interested in the ability for a client, a researcher or some other observer to detect when a site might be gathering information for the purpose of fingerprinting. If sites need to indicate that they want this data, that provides a way for others to measure the potential use of this data for fingerprinting users.
> 
> Perhaps we should get crisp about the threat model: are we primarily concerned with the behavior (and potential fingerprinting) by sites themselves? Or by embedded third-parties?

Browser fingerprinting used for unsanctioned tracking is a privacy concern both with top-level navigation sites and with embedded third parties.

> If the answer is "both", then we're sort of stuck; the need for active fingerprinting measures falls as the % of users are sending CH headers rises. That said, we have lots of experience with old/dead code from transition periods living well past it's expiration date (this has been a sizable part of my week!).

I'm not sure what you mean by being stuck. Is there a concern that default CH headers are or will be sent in such numbers that changes can't be made? (That's not my understanding of current implementations.)

> If the answer is third parties, keeping the opt-in behavior for enabling this for x-origin requests (and perhaps iframes) satisfies the need.
> 
> If the answer is first-parties, then do we have any faith that researchers will be able to understand this behavior at scale?

Do you have some specific reason for skepticism about researchers? We have extensive research from multiple groups on detecting and measuring browser fingerprinting up to this point. Here's ten research papers and several browser vendor pages, although this list isn't exhaustive: https://w3c.github.io/fingerprinting-guidance/#research <https://w3c.github.io/fingerprinting-guidance/#research>

That's partly how we know about this practice at all, and how we've been able to evaluate both the capability of new APIs to be used for fingerprinting and the existence/prevalence of those practices. In order to use the alternative means that the TAG has suggested for discouraging unsanctioned tracking, I believe we need to rely on the ability of researchers (whether academics, data protection authorities, private companies or others) to understand this behavior.

> In some cases, as ekr is suggesting with the elsewhere-proposed geolocation hint, that might be done by a single client who evaluates whether it makes sense to share that data in that case. But even outside the capabilities of a single client, the data on larger-scale use is only available if servers have to provide some visible indication that they want that data. Researchers can analyze after-the-fact whether those requests are reasonable for content-negotiation or not; policymakers and enforcement agencies can use that data to support cases, etc.
> 
> The balance of appropriateness is going to have a very interesting lens. Not providing information like DPR, network type, memory/cpu class, and other runtime factors dooms the users on the poorest connections to the worst overhead for getting to good experiences. That's deeply unfair.

I'm not convinced that providing detailed information on user devices will always or typically benefit more vulnerable users. Privacy is often especially important to more vulnerable users either because they encounter particularly dangerous threats or because they're more likely to face discrimination or suffer consequences from a lack of privacy. (Might not sites sniff these headers and provide "not supported" pages to devices that don't have sufficiently advanced hardware or to users that aren't good advertising targets?) If there's a documented analysis on appropriateness or fairness of this work, I'd certainly appreciate a link to it. I know that the Human Rights Protocol Consideration group at the IRTF has some guidance for broader reviews that they're applying to IETF drafts.

The draft in question does not explicitly include information on network type, device memory or CPU information -- are you referring to some other specification or capability?

> Other properties like, as you mention, geolocation, might not fit into such a neat bucket. I'd personally welcome proposals that tried to tease these apart, for 2 reasons:
> If a default-provided CH list were small, it might allay concerns about header bloat and growth
> If we're specifically concerned about some set of hints as being more sensitive than others, then this would provide an avenue to preserving opt-in for them.
> Don't know if that's ideal, but I think a direction worth exploring.
> 
> Thoughts?

Rather than splitting up headers into sensitive and maybe-not-so-sensitive, why not follow the principle of data minimization and let sites ask for the information that they need? That would both discourage bloat and maintain detectability of device fingerprinting. Alternatively, what is the case against preserving opt-in for all Client Hints? Is it that some developers may find it confusing to send "Accept-CH" as a response header?

Thanks,
Nick

Received on Wednesday, 18 April 2018 01:43:35 UTC