Re: Migrating some high-entropy HTTP headers to Client Hints. from Yoav Weiss on 2019-04-11 (ietf-http-wg@w3.org from April to June 2019)

From: Yoav Weiss <yoav@yoav.ws>
Date: Thu, 11 Apr 2019 17:47:33 -0400
To: Ronan Cremin <rcremin@afilias.info>
Cc: Thomas Peterson <hidinginthebbc@gmail.com>, Mike West <mkwst@google.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CACj=BEiNPVgd7SvhBCqaTyp7c0Jg8QcjXnQt5Fk7Te367hB_pQ@mail.gmail.com>
Hey Ronan,


On Thu, Apr 11, 2019 at 8:11 AM Ronan Cremin <rcremin@afilias.info> wrote:

> Hi,
>
> My name is Ronan Cremin, I help to build a device recognition product
> widely-used in the web analytics, publishing and advertising industries.
> Full disclosure: my employer profits from analysis of UA strings, though
> moving the same information to client hints is not expected to impact
> this materially.
>
> One concern over moving UA string information to Client Hints is that
> the information required to publish device-specific responses arrives
> only in the second request from the client. This imposes a performance
> penalty on publishers that serve a device-tailored HTML document. As
> Mike mentioned, RWD notwithstanding, many publishers employ
> device-specific responses as envisaged in RFC1945, usually to tailor the
> experience to a class of device e.g. smartphone, tablet, desktop and so
> on.


The viewport Client Hint can provide such distinction, but it exposes more
bits that are actually needed, so not a great option to expose by default,
without an opt-in.
From your description, maybe exposing another tri-state hint by default
will be enough to cover the use-case and maybe it not expose too much bits
about the user in the process.
Can you open an issue on https://github.com/WICG/ua-client-hints describing
your use-case?

If we were to conclude that something like that is privacy-safe, I guess
the main problem would be to define where is the line drawn between a phone
and a tablet, and between a tablet and a laptop.
I suspect a standard definition of those borders is likely to become stale
fairly quickly...

Publishers endeavour to fit everything required for the first screen
> of content into this first response, so a delay to this impacts
> performance. The last time I checked more than 80% of the top 100
> websites used this technique.
>

When was that? Do you have data you can point us to?


>
> Web analytics might also be impacted. Most web analytics solutions
> support a JavaScript-free integration approach based on linking a single
> pixel image hosted by the analytics platform. The ability to do this is
> impacted for the same reason—the information required for analytics
> becomes available only on the second request from the client.
>

I'm not sure that's a winning argument, as it sounds like those analytics
vendors exploit the current UA string to extract bits of information from
passive requests.
The current proposal will enable them to do the same (with the same number
of RTTs), but only after an explicit opt-in to receive that data from the
browser. An opt-in that can be monitored by the browser, extensions and
privacy researchers.


> Has thought been given to the performance impact of the proposal?


Yes.


> Yoav
> mentions this issue in his Client Hints infrastructure document
> (https://github.com/yoavweiss/client-hints-infrastructure) but I haven't
> seen any attempt to quantify the impact.


As indicated in the document you linked to, we currently don't have a great
way to make fingerprinting-bits-exposing Client Hints an opt-in while
keeping sending those on the very first request.
That's unfortunate and we hope to improve on that in the future.
At the same time, the User-Agent string is exposing many bits of entropy,
so it is a privacy hole we're interested in blocking.


>
> Regards,
> Ronan
>
> On 29/11/2018 12:08, Thomas Peterson wrote:
> > I would propose that all Accept* headers are included in Client Hints
> > as all can be used for some level of fingerprinting, e.g. Accept can
> > used to distinguish between desktop browsers (which typically have
> > html/xml MIME types) and cURL/wget which by default have '*/*'. Many
> > user agents also do their own guess work on response bodies anyway
> > (such as looking at the magic number) to determine content type or
> > encoding, so the impact of a "failed negotiation" of content can be
> > limited.
> >
> > Also, Is there a particular reason why Sec-CH-Lang omits Quality Values?
> >
> >
> > Regards
> >
> >
> > On 29/11/2018 10:22, Mike West wrote:
> >> Hey folks,
> >>
> >> Section 9.7 of RFC7231
> >> <https://tools.ietf.org/html/rfc7231#section-9.7> rightly notes that
> >> some of the content negotiation headers user agents deliver in HTTP
> >> requests create substantial fingerprinting surface. I think it would
> >> be beneficial if we took steps to reduce their prevalence on the
> >> wire, and Client Hints looks like a reasonable infrastructure on top
> >> of which to build.
> >>
> >> `User-Agent` and `Accept-Language` seem like particularly tasty and
> >> low-hanging fruit, and I've sketched out two proposals as proofs of
> >> concept:
> >>
> >> *   `User-Agent` could be represented as ~four distinct hints: `UA`,
> >> `Model`, `Platform`, and `Arch`:
> >> https://github.com/mikewest/ua-client-hints is a high-level
> >> explainer, and https://tools.ietf.org/html/draft-west-ua-client-hints
> >> a sketchy ID for the new headers.
> >>
> >> *   `Accept-Language` could be represented as a `Lang` hint:
> >> https://github.com/mikewest/lang-client-hint is a high-level
> >> explainer, https://tools.ietf.org/html/draft-west-lang-client-hint an
> >> equally sketchy ID for the new header.
> >>
> >> I'd appreciate y'all's feedback. Thanks!
> >>
> >> -mike
> >
>
>
>
>
Received on Thursday, 11 April 2019 21:48:27 UTC