Re: `Accept-CH` header is weird from Alex Russell on 2018-04-14 (www-tag@w3.org from April 2018)

From: Alex Russell <slightlyoff@google.com>
Date: Fri, 13 Apr 2018 17:17:36 -0700
To: Nick Doty <npdoty@ischool.berkeley.edu>
Cc: Eric Rescorla <ekr@rtfm.com>, TAG List <www-tag@w3.org>, "public-privacy (W3C mailing list)" <public-privacy@w3.org>
Message-ID: <CANr5HFUWkrNUjJvPnEuq1HdJPuOqB7fTAnCYN6-1bossTz4VHQ@mail.gmail.com>
On Fri, Apr 13, 2018 at 1:02 PM, Nick Doty <npdoty@ischool.berkeley.edu>
wrote:

> On Apr 13, 2018, at 9:07 AM, Alex Russell <slightlyoff@google.com> wrote:
> >
> > Hi Nick, Eric:
> >
> > Thanks for the thoughtful comments.
> >
> > The analysis about opt-in free CH comes from turning the question around
> slightly: if any origin can opt-in for subsequent top-level navigations,
> and browsers wish not to provide that information to those origins
> (selectively, or as a policy), isn't the recourse identical to the recourse
> in the non-opt-in scenario?
> >
> > I.e., browsers will need to filter the list (perhaps to zero) to
> maintain whatever privacy invariants they wish to maintain. To be
> effective, this would also need to be paired with removing/modifying the
> runtime behavior of web content (turning off media query support, blocking
> various DOM APIs, disabling scripting, etc.).
>
> As a reminder, this doesn't apply just to top-level navigations, but also
> to embedded content, which might be loading of an image that doesn't have
> access to DOM APIs, JavaScript, or CSS.


Understood, and this gets to a part of the design I'm somewhat uneasy
about: it seems more risky for CH to be sent for x-origin sub-resources
than for top-level navigations. I'd even support the opt-in being required
to trigger CH header sending for sub-resources, whereas the argument for
setting it on the top-level resource feels flimsy.

Such a future is possible as an evolution of the current design.


> > Similarly, if browsers want to know what's being exfiltrated, they can
> simply not support CH (always send a null list).  If visibility of
> script/active measures is the goal, this seems like the only way to achieve
> it.
> >
> > The opt-in-CH-is-good position seems untenable as a result.
>
> I think this is the distinction between limiting fingerprinting surface
> and making fingerprinting detectable.
>
> I agree that browsers that wish not to expose this fingerprinting surface
> or to limit disclosure of this information to sites will not send the
> client hint in any case, whether there's an opt-in request from the server
> or not. But we're also especially interested in the ability for a client, a
> researcher or some other observer to detect when a site might be gathering
> information for the purpose of fingerprinting. If sites need to indicate
> that they want this data, that provides a way for others to measure the
> potential use of this data for fingerprinting users.
>

Perhaps we should get crisp about the threat model: are we primarily
concerned with the behavior (and potential fingerprinting) by sites
themselves? Or by embedded third-parties?

If the answer is "both", then we're sort of stuck; the need for active
fingerprinting measures falls as the % of users are sending CH headers
rises. That said, we have lots of experience with old/dead code from
transition periods living *well* past it's expiration date (this has been a
sizable part of my week!).

If the answer is third parties, keeping the opt-in behavior for enabling
this for x-origin requests (and perhaps iframes) satisfies the need.

If the answer is first-parties, then do we have any faith that researchers
will be able to understand this behavior at scale?


> In some cases, as ekr is suggesting with the elsewhere-proposed
> geolocation hint, that might be done by a single client who evaluates
> whether it makes sense to share that data in that case. But even outside
> the capabilities of a single client, the data on larger-scale use is only
> available if servers have to provide some visible indication that they want
> that data. Researchers can analyze after-the-fact whether those requests
> are reasonable for content-negotiation or not; policymakers and enforcement
> agencies can use that data to support cases, etc.
>

The balance of appropriateness is going to have a very interesting lens.
Not providing information like DPR, network type, memory/cpu class, and
other runtime factors dooms the users on the poorest connections to the
worst overhead for getting to good experiences. That's deeply unfair. Other
properties like, as you mention, geolocation, might not fit into such a
neat bucket. I'd personally welcome proposals that tried to tease these
apart, for 2 reasons:


   1. If a default-provided CH list were small, it might allay concerns
   about header bloat and growth
   2. If we're specifically concerned about some set of hints as being more
   sensitive than others, then this would provide an avenue to preserving
   opt-in for them.

Don't know if that's ideal, but I think a direction worth exploring.

Thoughts?
Received on Saturday, 14 April 2018 00:18:46 UTC