Re: Call for Adoption: draft-grigorik-http-client-hints from Yoav Weiss on 2015-08-31 (ietf-http-wg@w3.org from July to September 2015)

From: Yoav Weiss <yoav@yoav.ws>
Date: Mon, 31 Aug 2015 11:25:25 +0200
To: Mark Nottingham <mnot@mnot.net>
Cc: Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
Message-ID: <CACj=BEjDUZt4yGa0ytj5RnTMyLZn3E1aFJShUaNQHQJrMhRc+Q@mail.gmail.com>
On Mon, Aug 31, 2015 at 9:00 AM, Mark Nottingham <mnot@mnot.net> wrote:

> Hi Amos,
>
> > On 25 Aug 2015, at 1:05 am, Amos Jeffries <squid3@treenet.co.nz> wrote:
> >
> > On 24/08/2015 6:13 p.m., Mark Nottingham wrote:
> >> We discussed this document in Dallas, and also a bit in Prague:
> >> <http://tools.ietf.org/html/draft-grigorik-http-client-hints-02>
> >>
> >> We'd talked about doing this at the same time as Key, but I think that
> can be decoupled (especially since we have an implementer for Client Hints
> without Key).
> >>
> >> Based on the discussion so far, I believe that we should adopt this
> document as a WG product, with a target of Proposed Standard.
> >>
> >> I've discussed it with our Area Director, who agrees that it's a
> reasonable thing for us to do.
> >>
> >> Please comment on-list; we’ll make a decision about adoption this time
> next week.
> >>
> >
> > My personal opinion;
> >  is that this seems like a fertile ground for filling the WG with more
> > politick.
> >
> > In the past years its been interesting watching the opinions of those
> > pushing for these tracking headers to exist formally compete with the
> > anonymity crowd, the privacy crowd and the anti-surveillance crowd.
> > Which kind of highlights that the main use-case, whether acknowledged
> > or not; will be for enabling surveillance.
> >
> > If they are going to exist at all, I think a standard format will be a
> > Good Thing. Much like Forwarded header sweeping away the custom cruft.
>
>
The feature as implemented in Blink is an opt-in only feature, which means
that the site has to turn it on by sending the "Accept-CH" headers (or the
equivalent <meta> tag).
So if fingerprinting is enabled by the feature, it is active
fingerprinting, and not different than JS that can be used to acquire the
same data.

Also, at least on mobile, I believe that the fingerprinting risks that
these headers expose is in no way larger than those already exposed by the
User-Agent string. On desktop, I doubt it would provide significant data to
enable user tracking.

The opt-in was a compromise for privacy, which have left many use-cases
behind. (e.g. adaptation of images when navigated to directly, adaptation
of HTML to specific viewport dimensions).


> OK. FWIW, I'd like to see us consider recommending that this feature be
> HTTPS-only, and perhaps limiting the granularity of information available
> in things that have pixel counts.
>
> Some background reading:
>   http://www.w3.org/2001/tag/doc/unsanctioned-tracking/
>   https://w3c.github.io/fingerprinting-guidance/
>
>
What's the attack vector that an HTTPS-only limitation will provide
protection from?
Limiting the granularity will surely also limit the benefits of this
feature. What would it give us?


>
> > (with web developer hat on);
> >
> > I've always been a little mystified why these were even being asked for.
> > The tools available for on-device decisions about display are pretty
> > good, an coding frameworks make development for those easy.
> >
> > Looking at screen pixel-width I can think of two cases where it would be
> > useful;
> >
> > 1) a browser fetching the main HTML of a page. Doing so lets the server
> > elide segments of the HTML doing logic for other screen sizes.
> >
> > The purpose of having that in the page in the first place is/was so
> > that the HTML part is portable between devices without re-downloading it
> > on each one. The sub-resources downloaded as needed or portable where
> > possible.
> >
> > Having non-portable versions makes the HTML itsef just an instance of
> > case #2, below.
> >
> >
> > 2) an app of game needing to fetch specific sized images or similar for
> > display.
> >
> > Embedding img size details in the fetched URL, and fetching only the
> > relevant one is just a far better way to go for latency, portability and
> > cache friendliness. There is no difference in origin server processing
> > load between where the detail is placed.
> >
> >
>

I think that by "latency" you mean "proxy-side latency". While keeping it
low is important, keeping the network RTTs required to fetch content to a
minimum is even more so.

Client-Hints' main use case is to make is easier to automate the
non-art-direction responsive images use cases
<https://usecases.responsiveimages.org/#resolution-based-selection>.
The first way folks have addressed the problem is not unlike the way you
describe - delaying the image loading until JS runs and dynamically forge
URLs on the client side. The problem here is that image loading incurred
huge delays as a result, and even if proxy-side latency was a few ms lower,
the overall site performance suffered delays in the hundreds of
milliseconds and worse.

Luckily, we now have a markup based solution to address that use case
without suffering that latency problem, called srcset
<https://html.spec.whatwg.org/multipage/embedded-content.html#attr-img-srcset>.
That makes it possible for authors to address the issue, but it's not very
easy to automate for all the reasons that Ilya has pointed out.

>
> > (and with my Squid hat on);
> >
> > It is funny the draft should point out cache friendliness as a reason
> > for avoiding alternative use cases. Yes this mechanism is better than
> > them, but only by degrees.
> >
> > Realistically negotiated content means using Vary/Key which are also
> > somewhat in the cache-unfriendly category. Because the logics needed to
> > identify and cache variants are quite complex and can *double* (or even
> > triple) the lookup latency for a cache, especially when optimized fast
> > hashing is used.
>
> We've talked about this offline some, and I think we can get to a place
> where Key can be at most a double-lookup.


Also, in cache deployments where the site's operator *knows* that CH will
be used to Vary the responses, they can add the CH headers to the resource
identifier at request time, eliminating any proxy latency concerns. I'm not
sure if Squid can be customized that way, but it's certainly feasible.


>
>
> > The URL-path with values in such places like filenames are just the
> > optimal form in terms of reduced latency from faster cache aggregation
> > and lookup. Actively increasing network latency by using negotiated
> > features seems a daft approach when its unnecessary.
> >
> > I'm still waiting for someone to present an actual concrete use-case
> > that proves there is a need for this latency loss just to relay teh
> > Client-Hints details (and some of the Accept things too). All I'm seeing
> > so far is that its better than some badly designed old/current systems
> > who probably wont ever be upgraded to the new mechanism anyway.
> >
> >
> > There are probably other use-cases on the table I'm not aware of yet.
> > I'm all ears.
>
> Any further thoughts after Ilya's mail?
>
> At this point, I'm hearing a fair amount of enthusiasm from a few
> different quarters (taking into account the discussion at the meeting).
>
> Cheers,
>
>
> --
> Mark Nottingham   https://www.mnot.net/
>
>
>
Received on Monday, 31 August 2015 09:25:54 UTC