Re: Call for Adoption: Cache Digests for HTTP/2 from Alcides Viamontes E on 2016-06-22 (ietf-http-wg@w3.org from April to June 2016)

From: Alcides Viamontes E <alcidesv@zunzun.se>
Date: Wed, 22 Jun 2016 13:39:39 +0200
To: Cory Benfield <cory@lukasa.co.uk>
Cc: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, Natasha Rooney <nrooney@gsma.com>, Kazuho Oku <kazuhooku@gmail.com>
Message-ID: <CAAMqGzbXk2WHShyrMqG5UNPZVqp7xb1UJLJtjFagC2SihLr=Hw@mail.gmail.com>
Hello Cory,

It's good to have your feedback. Below are answers to your comments, but I
do expect to use this conversation to fill my gaps.

On Wed, Jun 22, 2016 at 12:43 PM, Cory Benfield <cory@lukasa.co.uk> wrote:

>
> > On 22 Jun 2016, at 09:28, Alcides Viamontes E <alcidesv@zunzun.se>
> wrote:
> >
> > This is bad for several reasons. AFAIK, sites don't have either a way to
> ask the browser to prematurely evict an expired representation that the
> browser would otherwise consider fresh. These two things together could
> allow a cache digest to grow indefinitely. Wouldn't that have a degrading
> effect on performance?
>
> This is presumably true of caching to begin with, correct? If the browser
> doesn’t consider the cached representation stale it is welcome to not emit
> a request for it at all, and simply to serve it from its cache. This means
> that the cache digest can only grow as large as the client cache allows it
> to grow, which I should certainly hope is not indefinitely large!
>

Yes, this is related to caching in general. And it is the reason people
have to add query strings for doing cache busting. This problem is a
separate issue, but it interacts with cache digests in that old version of
assets are kept in the cache and  therefore in the cache digest and the
origin have no way of removing it. The origin can only create a new URL
(say, via a new query string) that gets added to the cache and the cache
digest.


>
> However, it may be sensible to consider providing a SETTINGS field that
> allows servers to flag a maximum size on a cache digest that it is willing
> to accept.
>

But this leaves the server without any control about which things are made
part of the cache digest. That's why we think scoping and an explicit
eviction mechanism are better long term solutions.


>
> > Also related to scoping is the following. Cache digests have been
> devised so far for a single HTTP/2 Push use scenario: pushing a few assets
> which are critical to the first render of a page. For that scenario, it is
> a good idea to keep the digests short. In other words, to not include all
> assets from the origin that the browser has in cache. Scoping is good here!
> But there are other HTTP/2 Push scenarios. PUSH_PROMISEs can be sent later
> on the lifetime of a connection. Since HTTP/2 recommends to not bundle
> assets, HTTP/2 Push is also handy to push hierarchies of small Javascript
> modules. Digests are also useful here, but with different assets than the
> digests at the beginning of a site fetch.
>
> Can you elaborate on this concern, please? For browsers, at least, the
> only meaningful use of PUSH_PROMISE at this time is to perform cache
> priming, and the draft allows sending multiple CACHE_DIGEST frames on the
> connection, which helps keep the server up to date on cache state.
>

Yes, I'm talking about cache priming. Just at different times: early during
the fech you really want to prime the cache for critical assets, like CSS
and webfonts. Priming the cache with a hierarchy of small Javascript
modules can be done much later. But here latency is also an issue and
HTTP/2 Push with cache digests is also useful. The standard, HTTP/1.1
recommendation is to take all those little Javascript modules and making a
huge .js file out of them. We wanted to deprecate that with HTTP/2.

Also think about the effectivity of cache digests for a Wordpress user
hosting a huge number of cached images in the same origin where the theme
.css is hosted.

Ideally, we would like the browser to say first "hey, this is the cache
digest for the CSS and other things that I need like really right now... it
is a really short digest" and later, when the page starts loading
javascript modules or other resources, the browser would say "hey, I
already have a lot of this  stuff, here is a fat digest with all of them".

If I understand correctly, the point of having cache digests is for the
browser to advertise/broadcast which assets it has in cache *as early as
possible*, so to avoid unneeded HTTP/2 asset push by the server. So if you
have a lot of assets in your domain which are needed at a later time, you
will receive digest bytes for all of them also very early. The size of the
digest is proportional to how many assets from this domain the browser has
in cache, not to how many assets the browser needs right now, nor even how
many assets the browser needs for rendering this page.

Scoping can be achieved with the current draft by having different origins.
It means that some people will need to get back to domain sharding for
static assets. But even then, the possibility of disabling digests would be
priceless.


> > Another concern we expressed on January was that there was no way to
> switch off the digest frame. All in all, we think that the current draft is
> a really good step, and we understand that it doesn't need to be perfect.
> But if it is not going to be good enough for all scenarios, it would be
> nice if it can be switched off. Since digests are only used from the second
> visit to a site, the browser could just remember some hint from the site on
> previous visits and abstain from using digests.
> >
> > For what it counts, and in short, here is our response to the call for
> adoption:
> >
> > - Should be this draft adopted in its current form?:    No.
> >
> > - What would be the minimum requirement to revert that response?: A way
> to switch off digests, so that operators opting for different
> implementations don't need to pay for it.
>
> Operators using other implementations shouldn’t have to pay for it. RFC
> 7540 § 5.5 says:
>


Sorry for not being clear enough, I'm talking about the "bytes cost". As
the draft stands now, a browser implementing cache digests will transmit a
number of bytes we have no control of with the cache digest, therefore
taking bandwidth resources. A website operator may very well be of the
opinion that those bytes are better used for something else, e.g. a poor
man, slimmer cache digest bottled either in a cookie or in a custom
put-by-service-worker header.


./Alcides.
Received on Wednesday, 22 June 2016 11:40:14 UTC