- From: Ilya Grigorik <ilya@grigorik.com>
- Date: Wed, 18 Apr 2018 09:18:08 -0700
- To: Nick Doty <npdoty@ischool.berkeley.edu>
- Cc: Alex Russell <slightlyoff@google.com>, Eric Rescorla <ekr@rtfm.com>, TAG List <www-tag@w3.org>, "public-privacy (W3C mailing list)" <public-privacy@w3.org>
- Message-ID: <CAKRe7JHgd4kBpuoN3cNcQVJJn3zzWPaV47QDR=vXvgHFEyEe-A@mail.gmail.com>
Hey folks. Think most of you are already aware of this, but just fyi.. - Current draft requires explicit opt-in from each origin (via Accept-CH), and the opt-in is scoped to 1P only. - For 3P, we determined that we want 1P to delegate permission: the proposal is to tackle that via Feature Policy. - Discussion in https://github.com/WICG/feature-policy/issues/129 Nick, re device capability hints are defined in: - https://w3c.github.io/device-memory/ - https://wicg.github.io/netinfo/ On Tue, Apr 17, 2018 at 6:42 PM, Nick Doty <npdoty@ischool.berkeley.edu> wrote: > On Apr 13, 2018, at 5:17 PM, Alex Russell <slightlyoff@google.com> wrote: > > On Fri, Apr 13, 2018 at 1:02 PM, Nick Doty <npdoty@ischool.berkeley.edu> > wrote: > >> On Apr 13, 2018, at 9:07 AM, Alex Russell <slightlyoff@google.com> wrote: >> > >> > Hi Nick, Eric: >> > >> > Thanks for the thoughtful comments. >> > >> > The analysis about opt-in free CH comes from turning the question >> around slightly: if any origin can opt-in for subsequent top-level >> navigations, and browsers wish not to provide that information to those >> origins (selectively, or as a policy), isn't the recourse identical to the >> recourse in the non-opt-in scenario? >> > >> > I.e., browsers will need to filter the list (perhaps to zero) to >> maintain whatever privacy invariants they wish to maintain. To be >> effective, this would also need to be paired with removing/modifying the >> runtime behavior of web content (turning off media query support, blocking >> various DOM APIs, disabling scripting, etc.). >> >> As a reminder, this doesn't apply just to top-level navigations, but also >> to embedded content, which might be loading of an image that doesn't have >> access to DOM APIs, JavaScript, or CSS. > > > Understood, and this gets to a part of the design I'm somewhat uneasy > about: it seems more risky for CH to be sent for x-origin sub-resources > than for top-level navigations. I'd even support the opt-in being required > to trigger CH header sending for sub-resources, whereas the argument for > setting it on the top-level resource feels flimsy. > > > There is an additional risk in sending Client Hints for cross-origin > resources and for resources that don't have active content, because there > may be cases where entirely new browser fingerprinting capability is > introduced. > > However, even in the case of top-level resources (HTML, with JavaScript) > making browser fingerprinting passive where it was previously active is a > significant change, one that undermines existing mitigations. I'm not sure > why that strikes you as flimsy. > > > Similarly, if browsers want to know what's being exfiltrated, they can >> simply not support CH (always send a null list). If visibility of >> script/active measures is the goal, this seems like the only way to achieve >> it. >> > >> > The opt-in-CH-is-good position seems untenable as a result. >> >> I think this is the distinction between limiting fingerprinting surface >> and making fingerprinting detectable. >> >> I agree that browsers that wish not to expose this fingerprinting surface >> or to limit disclosure of this information to sites will not send the >> client hint in any case, whether there's an opt-in request from the server >> or not. But we're also especially interested in the ability for a client, a >> researcher or some other observer to detect when a site might be gathering >> information for the purpose of fingerprinting. If sites need to indicate >> that they want this data, that provides a way for others to measure the >> potential use of this data for fingerprinting users. >> > > Perhaps we should get crisp about the threat model: are we primarily > concerned with the behavior (and potential fingerprinting) by sites > themselves? Or by embedded third-parties? > > > Browser fingerprinting used for unsanctioned tracking is a privacy concern > both with top-level navigation sites and with embedded third parties. > > If the answer is "both", then we're sort of stuck; the need for active > fingerprinting measures falls as the % of users are sending CH headers > rises. That said, we have lots of experience with old/dead code from > transition periods living *well* past it's expiration date (this has been > a sizable part of my week!). > > > I'm not sure what you mean by being stuck. Is there a concern that default > CH headers are or will be sent in such numbers that changes can't be made? > (That's not my understanding of current implementations.) > > If the answer is third parties, keeping the opt-in behavior for enabling > this for x-origin requests (and perhaps iframes) satisfies the need. > > If the answer is first-parties, then do we have any faith that researchers > will be able to understand this behavior at scale? > > > Do you have some specific reason for skepticism about researchers? We have > extensive research from multiple groups on detecting and measuring browser > fingerprinting up to this point. Here's ten research papers and several > browser vendor pages, although this list isn't exhaustive: https://w3c. > github.io/fingerprinting-guidance/#research > > That's partly how we know about this practice at all, and how we've been > able to evaluate both the capability of new APIs to be used for > fingerprinting and the existence/prevalence of those practices. In order to > use the alternative means that the TAG has suggested for discouraging > unsanctioned tracking, I believe we need to rely on the ability of > researchers (whether academics, data protection authorities, private > companies or others) to understand this behavior. > > In some cases, as ekr is suggesting with the elsewhere-proposed >> geolocation hint, that might be done by a single client who evaluates >> whether it makes sense to share that data in that case. But even outside >> the capabilities of a single client, the data on larger-scale use is only >> available if servers have to provide some visible indication that they want >> that data. Researchers can analyze after-the-fact whether those requests >> are reasonable for content-negotiation or not; policymakers and enforcement >> agencies can use that data to support cases, etc. >> > > The balance of appropriateness is going to have a very interesting lens. > Not providing information like DPR, network type, memory/cpu class, and > other runtime factors dooms the users on the poorest connections to the > worst overhead for getting to good experiences. That's deeply unfair. > > > I'm not convinced that providing detailed information on user devices will > always or typically benefit more vulnerable users. Privacy is often > especially important to more vulnerable users either because they encounter > particularly dangerous threats or because they're more likely to face > discrimination or suffer consequences from a lack of privacy. (Might not > sites sniff these headers and provide "not supported" pages to devices that > don't have sufficiently advanced hardware or to users that aren't good > advertising targets?) If there's a documented analysis on appropriateness > or fairness of this work, I'd certainly appreciate a link to it. I know > that the Human Rights Protocol Consideration group at the IRTF has some > guidance for broader reviews that they're applying to IETF drafts. > > The draft in question does not explicitly include information on network > type, device memory or CPU information -- are you referring to some other > specification or capability? > > Other properties like, as you mention, geolocation, might not fit into > such a neat bucket. I'd personally welcome proposals that tried to tease > these apart, for 2 reasons: > > 1. If a default-provided CH list were small, it might allay concerns > about header bloat and growth > 2. If we're specifically concerned about some set of hints as being > more sensitive than others, then this would provide an avenue to preserving > opt-in for them. > > Don't know if that's ideal, but I think a direction worth exploring. > > Thoughts? > > > Rather than splitting up headers into sensitive and > maybe-not-so-sensitive, why not follow the principle of data minimization > and let sites ask for the information that they need? That would both > discourage bloat and maintain detectability of device fingerprinting. > Alternatively, what is the case against preserving opt-in for all Client > Hints? Is it that some developers may find it confusing to send "Accept-CH" > as a response header? > > Thanks, > Nick >
Received on Thursday, 19 April 2018 16:16:24 UTC