RE: `Accept-CH` header is weird from Mike O'Neill on 2018-04-14 (public-privacy@w3.org from April to June 2018)

From: Mike O'Neill <michael.oneill@baycloud.com>
Date: Sat, 14 Apr 2018 08:30:13 +0100
To: "'Alex Russell'" <slightlyoff@google.com>, "'Nick Doty'" <npdoty@ischool.berkeley.edu>
Cc: "'Eric Rescorla'" <ekr@rtfm.com>, "'TAG List'" <www-tag@w3.org>, "'public-privacy $W3C mailing list$'" <public-privacy@w3.org>
Message-ID: <0a5e01d3d3c2$6dec3870$49c4a950$@baycloud.com>
Hi Alex,

 

I agree the biggest risk is uncontrolled CH headers on x-origin embeds, but we have seen in the past how restrictions only based on embedded resources get bypassed by those determined to do it, e.g. the original third-party cookies blocking on Safari was got round multiple times, eventually forcing the WebKit team to respond with the more elaborate ITP (still leaky, though I think the concept of partitioned cookies it introduced is a major step forward). 

 

Why introduce needless complexity, just make all resources require the opt-in, whatever the context.

 

Mike

 

 

 

 

From: Alex Russell <slightlyoff@google.com> 
Sent: 14 April 2018 01:18
To: Nick Doty <npdoty@ischool.berkeley.edu>
Cc: Eric Rescorla <ekr@rtfm.com>; TAG List <www-tag@w3.org>; public-privacy (W3C mailing list) <public-privacy@w3.org>
Subject: Re: `Accept-CH` header is weird

 

 

 

On Fri, Apr 13, 2018 at 1:02 PM, Nick Doty <npdoty@ischool.berkeley.edu <mailto:npdoty@ischool.berkeley.edu> > wrote:

On Apr 13, 2018, at 9:07 AM, Alex Russell <slightlyoff@google.com <mailto:slightlyoff@google.com> > wrote:
> 
> Hi Nick, Eric:
> 
> Thanks for the thoughtful comments.
> 
> The analysis about opt-in free CH comes from turning the question around slightly: if any origin can opt-in for subsequent top-level navigations, and browsers wish not to provide that information to those origins (selectively, or as a policy), isn't the recourse identical to the recourse in the non-opt-in scenario?
> 
> I.e., browsers will need to filter the list (perhaps to zero) to maintain whatever privacy invariants they wish to maintain. To be effective, this would also need to be paired with removing/modifying the runtime behavior of web content (turning off media query support, blocking various DOM APIs, disabling scripting, etc.).

As a reminder, this doesn't apply just to top-level navigations, but also to embedded content, which might be loading of an image that doesn't have access to DOM APIs, JavaScript, or CSS.

 

Understood, and this gets to a part of the design I'm somewhat uneasy about: it seems more risky for CH to be sent for x-origin sub-resources than for top-level navigations. I'd even support the opt-in being required to trigger CH header sending for sub-resources, whereas the argument for setting it on the top-level resource feels flimsy.

 

Such a future is possible as an evolution of the current design.

 


> Similarly, if browsers want to know what's being exfiltrated, they can simply not support CH (always send a null list).  If visibility of script/active measures is the goal, this seems like the only way to achieve it.
> 
> The opt-in-CH-is-good position seems untenable as a result.

I think this is the distinction between limiting fingerprinting surface and making fingerprinting detectable.

I agree that browsers that wish not to expose this fingerprinting surface or to limit disclosure of this information to sites will not send the client hint in any case, whether there's an opt-in request from the server or not. But we're also especially interested in the ability for a client, a researcher or some other observer to detect when a site might be gathering information for the purpose of fingerprinting. If sites need to indicate that they want this data, that provides a way for others to measure the potential use of this data for fingerprinting users.

 

Perhaps we should get crisp about the threat model: are we primarily concerned with the behavior (and potential fingerprinting) by sites themselves? Or by embedded third-parties?

 

If the answer is "both", then we're sort of stuck; the need for active fingerprinting measures falls as the % of users are sending CH headers rises. That said, we have lots of experience with old/dead code from transition periods living well past it's expiration date (this has been a sizable part of my week!).

 

If the answer is third parties, keeping the opt-in behavior for enabling this for x-origin requests (and perhaps iframes) satisfies the need.

 

If the answer is first-parties, then do we have any faith that researchers will be able to understand this behavior at scale?

 

In some cases, as ekr is suggesting with the elsewhere-proposed geolocation hint, that might be done by a single client who evaluates whether it makes sense to share that data in that case. But even outside the capabilities of a single client, the data on larger-scale use is only available if servers have to provide some visible indication that they want that data. Researchers can analyze after-the-fact whether those requests are reasonable for content-negotiation or not; policymakers and enforcement agencies can use that data to support cases, etc.

 

The balance of appropriateness is going to have a very interesting lens. Not providing information like DPR, network type, memory/cpu class, and other runtime factors dooms the users on the poorest connections to the worst overhead for getting to good experiences. That's deeply unfair. Other properties like, as you mention, geolocation, might not fit into such a neat bucket. I'd personally welcome proposals that tried to tease these apart, for 2 reasons:

 

1.      If a default-provided CH list were small, it might allay concerns about header bloat and growth

2.      If we're specifically concerned about some set of hints as being more sensitive than others, then this would provide an avenue to preserving opt-in for them.

Don't know if that's ideal, but I think a direction worth exploring.

 

Thoughts?
Received on Saturday, 14 April 2018 07:30:46 UTC