Re: `Accept-CH` header is weird from Ilya Grigorik on 2018-04-18 (public-privacy@w3.org from April to June 2018)

From: Ilya Grigorik <ilya@grigorik.com>
Date: Wed, 18 Apr 2018 09:18:08 -0700
To: Nick Doty <npdoty@ischool.berkeley.edu>
Cc: Alex Russell <slightlyoff@google.com>, Eric Rescorla <ekr@rtfm.com>, TAG List <www-tag@w3.org>, "public-privacy (W3C mailing list)" <public-privacy@w3.org>
Message-ID: <CAKRe7JHgd4kBpuoN3cNcQVJJn3zzWPaV47QDR=vXvgHFEyEe-A@mail.gmail.com>
Hey folks. Think most of you are already aware of this, but just fyi..

   - Current draft requires explicit opt-in from each origin (via
   Accept-CH), and the opt-in is scoped to 1P only.
   - For 3P, we determined that we want 1P to delegate permission: the
   proposal is to tackle that via Feature Policy.
      - Discussion in https://github.com/WICG/feature-policy/issues/129

Nick, re device capability hints are defined in:

   - https://w3c.github.io/device-memory/
   - https://wicg.github.io/netinfo/




On Tue, Apr 17, 2018 at 6:42 PM, Nick Doty <npdoty@ischool.berkeley.edu>
wrote:

> On Apr 13, 2018, at 5:17 PM, Alex Russell <slightlyoff@google.com> wrote:
>
> On Fri, Apr 13, 2018 at 1:02 PM, Nick Doty <npdoty@ischool.berkeley.edu>
> wrote:
>
>> On Apr 13, 2018, at 9:07 AM, Alex Russell <slightlyoff@google.com> wrote:
>> >
>> > Hi Nick, Eric:
>> >
>> > Thanks for the thoughtful comments.
>> >
>> > The analysis about opt-in free CH comes from turning the question
>> around slightly: if any origin can opt-in for subsequent top-level
>> navigations, and browsers wish not to provide that information to those
>> origins (selectively, or as a policy), isn't the recourse identical to the
>> recourse in the non-opt-in scenario?
>> >
>> > I.e., browsers will need to filter the list (perhaps to zero) to
>> maintain whatever privacy invariants they wish to maintain. To be
>> effective, this would also need to be paired with removing/modifying the
>> runtime behavior of web content (turning off media query support, blocking
>> various DOM APIs, disabling scripting, etc.).
>>
>> As a reminder, this doesn't apply just to top-level navigations, but also
>> to embedded content, which might be loading of an image that doesn't have
>> access to DOM APIs, JavaScript, or CSS.
>
>
> Understood, and this gets to a part of the design I'm somewhat uneasy
> about: it seems more risky for CH to be sent for x-origin sub-resources
> than for top-level navigations. I'd even support the opt-in being required
> to trigger CH header sending for sub-resources, whereas the argument for
> setting it on the top-level resource feels flimsy.
>
>
> There is an additional risk in sending Client Hints for cross-origin
> resources and for resources that don't have active content, because there
> may be cases where entirely new browser fingerprinting capability is
> introduced.
>
> However, even in the case of top-level resources (HTML, with JavaScript)
> making browser fingerprinting passive where it was previously active is a
> significant change, one that undermines existing mitigations. I'm not sure
> why that strikes you as flimsy.
>
> > Similarly, if browsers want to know what's being exfiltrated, they can
>> simply not support CH (always send a null list).  If visibility of
>> script/active measures is the goal, this seems like the only way to achieve
>> it.
>> >
>> > The opt-in-CH-is-good position seems untenable as a result.
>>
>> I think this is the distinction between limiting fingerprinting surface
>> and making fingerprinting detectable.
>>
>> I agree that browsers that wish not to expose this fingerprinting surface
>> or to limit disclosure of this information to sites will not send the
>> client hint in any case, whether there's an opt-in request from the server
>> or not. But we're also especially interested in the ability for a client, a
>> researcher or some other observer to detect when a site might be gathering
>> information for the purpose of fingerprinting. If sites need to indicate
>> that they want this data, that provides a way for others to measure the
>> potential use of this data for fingerprinting users.
>>
>
> Perhaps we should get crisp about the threat model: are we primarily
> concerned with the behavior (and potential fingerprinting) by sites
> themselves? Or by embedded third-parties?
>
>
> Browser fingerprinting used for unsanctioned tracking is a privacy concern
> both with top-level navigation sites and with embedded third parties.
>
> If the answer is "both", then we're sort of stuck; the need for active
> fingerprinting measures falls as the % of users are sending CH headers
> rises. That said, we have lots of experience with old/dead code from
> transition periods living *well* past it's expiration date (this has been
> a sizable part of my week!).
>
>
> I'm not sure what you mean by being stuck. Is there a concern that default
> CH headers are or will be sent in such numbers that changes can't be made?
> (That's not my understanding of current implementations.)
>
> If the answer is third parties, keeping the opt-in behavior for enabling
> this for x-origin requests (and perhaps iframes) satisfies the need.
>
> If the answer is first-parties, then do we have any faith that researchers
> will be able to understand this behavior at scale?
>
>
> Do you have some specific reason for skepticism about researchers? We have
> extensive research from multiple groups on detecting and measuring browser
> fingerprinting up to this point. Here's ten research papers and several
> browser vendor pages, although this list isn't exhaustive: https://w3c.
> github.io/fingerprinting-guidance/#research
>
> That's partly how we know about this practice at all, and how we've been
> able to evaluate both the capability of new APIs to be used for
> fingerprinting and the existence/prevalence of those practices. In order to
> use the alternative means that the TAG has suggested for discouraging
> unsanctioned tracking, I believe we need to rely on the ability of
> researchers (whether academics, data protection authorities, private
> companies or others) to understand this behavior.
>
> In some cases, as ekr is suggesting with the elsewhere-proposed
>> geolocation hint, that might be done by a single client who evaluates
>> whether it makes sense to share that data in that case. But even outside
>> the capabilities of a single client, the data on larger-scale use is only
>> available if servers have to provide some visible indication that they want
>> that data. Researchers can analyze after-the-fact whether those requests
>> are reasonable for content-negotiation or not; policymakers and enforcement
>> agencies can use that data to support cases, etc.
>>
>
> The balance of appropriateness is going to have a very interesting lens.
> Not providing information like DPR, network type, memory/cpu class, and
> other runtime factors dooms the users on the poorest connections to the
> worst overhead for getting to good experiences. That's deeply unfair.
>
>
> I'm not convinced that providing detailed information on user devices will
> always or typically benefit more vulnerable users. Privacy is often
> especially important to more vulnerable users either because they encounter
> particularly dangerous threats or because they're more likely to face
> discrimination or suffer consequences from a lack of privacy. (Might not
> sites sniff these headers and provide "not supported" pages to devices that
> don't have sufficiently advanced hardware or to users that aren't good
> advertising targets?) If there's a documented analysis on appropriateness
> or fairness of this work, I'd certainly appreciate a link to it. I know
> that the Human Rights Protocol Consideration group at the IRTF has some
> guidance for broader reviews that they're applying to IETF drafts.
>
> The draft in question does not explicitly include information on network
> type, device memory or CPU information -- are you referring to some other
> specification or capability?
>
> Other properties like, as you mention, geolocation, might not fit into
> such a neat bucket. I'd personally welcome proposals that tried to tease
> these apart, for 2 reasons:
>
>    1. If a default-provided CH list were small, it might allay concerns
>    about header bloat and growth
>    2. If we're specifically concerned about some set of hints as being
>    more sensitive than others, then this would provide an avenue to preserving
>    opt-in for them.
>
> Don't know if that's ideal, but I think a direction worth exploring.
>
> Thoughts?
>
>
> Rather than splitting up headers into sensitive and
> maybe-not-so-sensitive, why not follow the principle of data minimization
> and let sites ask for the information that they need? That would both
> discourage bloat and maintain detectability of device fingerprinting.
> Alternatively, what is the case against preserving opt-in for all Client
> Hints? Is it that some developers may find it confusing to send "Accept-CH"
> as a response header?
>
> Thanks,
> Nick
>
Received on Thursday, 19 April 2018 16:16:24 UTC